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Introduction 


The  37th  Annual  Sanibel  Symposium,  orga¬ 
nized  by  the  faculty,  students,  and  staff  of 
the  Quantum  Theory  Project  of  the  University  of 
Florida,  was  held  on  March  1-7, 1997.  The  meeting 
was  again  held  at  the  Ponce  de  Leon  Conference 
Center  in  St.  Augustine,  Florida. 

The  symposium  followed  the  established  format 
with  plenary  and  poster  sessions.  This  year,  the 
schedule  was  shortened  somewhat  with  a  compact 
seven-day  integrated  program  of  quantum  biol¬ 
ogy,  quantum  chemistry,  and  condensed  matter 
physics.  The  topics  of  the  sessions  covered  by 
these  proceedings  include  Quantum  Biology, 
Quantum  and  Classical  Molecular  Dynamics,  Pro¬ 
tein  Structure  and  Folding,  Monte  Carlo  Simula¬ 
tions,  and  Free  Energy  Calculations  of  Biological 
Molecules. 

The  articles  have  been  subjected  to  the  ordinary 
refereeing  procedures  of  the  International  Journal  of 
Quantum  Chemistry.  The  articles  presented  in  the 
sessions  on  quantum  chemistry,  condensed  matter 
physics,  and  associated  poster  sessions  are  pub¬ 
lished  in  a  separate  issue  of  the  International  Jour¬ 
nal  of  Quantum  Chemistry. 

The  organizers  acknowledge  the  following 
sponsors  for  their  support  of  the  1997  Sanibel  Sym¬ 
posium: 

■  Army  Research  Office  and  U.  S.  Army  Edge- 
wood  RD&E  Center  through  Grant 
#DAAG55-97-l-0020: 

"The  views,  opinions,  and/or  findings 
contained  in  the  report  are  those  of  the  au¬ 
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thor(s)  and  should  not  be  construed  as  an 
official  Department  of  the  Army  position, 
policy,  or  decision,  unless  so  designated  by 
other  documentation." 

■  The  Office  of  Naval  Research  through  Grant 
#N00014-97-l-0320: 

"This  work  relates  to  Department  of  the 
Navy  Grant  #N00014-97-l-0320  issued  by 
the  Office  of  Naval  Research.  The  United 
States  Government  has  the  royalty-free  li¬ 
cense  throughout  the  world  in  all  copy¬ 
rightable  material  contained  herein." 

■  IBM  Corporation. 

■  Hypercube,  Inc. 

■  Silicon  Graphics. 

■  The  University  of  Florida. 

Very  special  thanks  to  the  staff  of  the  Quantum 
Theory  Project  of  the  University  of  Florida  for 
handling  the  numerous  administrative,  clerical, 
and  practical  details.  The  organizers  are  proud  to 
recognize  the  contributions  of  Mrs.  Judy  Parker, 
Ms.  Sharon  Stellato,  Ms.  Sandra  Weakland,  and 
Mr.  Greg  Pearl.  All  the  graduate  students  of  the 
Quantum  Theory  Project,  who  served  as  "gofers," 
are  gratefully  recognized  for  their  contributions  to 
the  1997  Sanibel  Symposium. 

N.  Y.  Ohrn 
J.  R.  Sabin 
M.  C.  Zerner 
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Ab  Initio  and  Molecular  Mechanics 
Conformational  Analysis  of  Neutral 
L-Proline 


MICHAEL  RAMEK,1  ANNE-MARIE  KELTERER,1  SONJA  NIKOLIC2 

1Institut  fur  Physikalische  und  Theoretische  Chemie,  Technische  Universitdt  Graz,  A-8010  Graz, 
Austria 

2  Ruder  Boskovic  Institute,  P.O.B.  1016,  HR-10000  Zagreb,  Croatia 
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ABSTRACT:  The  energetically  low-lying  parts  of  the  potential  energy  surface  of 
L-proline  were  investigated  by  ab  initio  (RHF/6-311  +  +G**)  calculations.  The  results  are 
discussed  with  respect  to  the  parametrization  of  the  MM3  force  field  and  in  comparison 
with  those  obtained  earlier  for  glycine  and  a-alanine,  on  the  one  hand,  and  for  N-acetyl- 
L-proline  amide,  on  the  other  hand.  ©  1997  John  Wiley  &  Sons,  Inc.  Int  J  Quant  Chem  65: 
1033-1045,  1997 


Introduction 

Amino  acids  have  often  been  the  target  of 
quantum  chemical  structure  investigations 
for  a  number  of  reasons.  One  of  these  reasons  is 
the  fact  that  amino  acids  form  zwitterions  in  the 
solid  state  and  in  polar  media,  whereas  the  neutral 
form  is  more  stable  for  isolated  molecules.  Most  of 
the  routine  experimental  techniques  for  structure 
determination,  like  X-ray  crystallography,  offer 
easy  access  to  the  zwitterionic  structure.  Experi¬ 
mental  structure  studies  of  the  neutral  form  re¬ 
quire  much  more  sophisticated  techniques.  Quan¬ 
tum  chemical  structure  investigations  are  a  perfect 
tool  in  this  situation,  since  their  basic  approach  is 
to  consider  a  single,  isolated  molecule.  The  case  of 
Correspondence  to:  M.  Ramek. 


glycine,  in  which  the  structure  of  the  most  stable 
neutral  conformer  was  first  predicted  on  the  basis 
of  ab  initio  calculations  [1]  and,  later,  based  on  the 
predicted  values,  confirmed  experimentally  [2,  3], 
proved  this  advantage  of  quantum  chemical  meth¬ 
ods  already  almost  20  years  ago.  Another  reason 
for  theoretical  structure  investigations  of  amino 
acids  is,  of  course,  the  biological  importance  of 
amino  acids  as  the  building  blocks  for  peptides 
and  proteins  and  the  need  to  understand  and  to 
describe  correctly  the  weak  interactions  between 
the  functional  groups  in  these  molecules  as  a  pre¬ 
requisite  for  a  computerized  molecular  modeling 
of  this  class  of  compounds. 

High-level  quantum  chemical  ab  initio  calcula¬ 
tions  were  reported  recently  for  a  number  of  amino 
acids,  especially  for  glycine,  alanine,  serine,  and 
cysteine  [4—8],  which  all  have  simple  side  chains. 
For  proline,  in  which  the  side  chain  is  linked  to  the 
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amino  nitrogen  atom,  an  early  ab  initio  study  was 
performed  a  decade  ago  by  Sapse  et  al.  [9].  Most  of 
that  work  was  based  on  data  obtained  with  the 
STO-3G  basis  set  and  subject  to  several  artificial 
restrictions  on  the  geometry  parameters,  which 
were  seemingly  necessary  to  be  able  to  perform 
the  calculations  with  the  computational  resources 
available  at  that  time.  As  a  consequence  of  these 
restrictions,  the  five-membered  pyrrolidine  ring  of 
proline  was  predicted  to  be  planar  in  the  free  acid 
as  well  as  in  the  N-formyl  and  the  N-acetyl  amide. 

Recently,  the  neutral  form  of  proline  has  be¬ 
come  the  focus  of  experimentally  dominated  inter¬ 
est:  Using  a  matrix  isolation  technique,  the  molec¬ 
ular  structure  was  experimentally  confirmed  to  be 
nonzwitterionic  [10],  and  the  vibrational  frequen¬ 
cies  of  proline  and  hydroxyproline  were  evaluated 
with  the  help  of  ab  initio-optimized  geometries 
[11].  Part  of  that  interest  in  proline  is,  of  course, 
due  to  the  unique  nature  of  proline  among  the 
naturally  occurring  amino  acids,  which  is  caused 
by  the  ring  structure  with  only  one  hydrogen  atom 
bonded  to  the  nitrogen  atom.  Since  this  hydrogen 
atom  is  eliminated  when  proline  becomes  part  of  a 
peptide  chain,  proline  has  only  limited  possibilities 
to  form  the  hydrogen  bonds  that  stabilize  the  he¬ 
lices  and  sheet  structures  in  peptides  and  proteins. 
Hence,  proline  is  a  delimiter  of  these  structural 
units,  which  makes  it  important  for  the  secondary 
structure  of  proteins  and  peptides  [12-16]. 

We  investigated  the  potential  energy  surface 
(PES)  of  L-proline  using  various  theoretical  meth¬ 
ods.  The  results  of  previous  RHF  calculations  on 
neutral  w-amino  acids  [17-19]  and  w-hydroxy 
acids  [20-22]  showed  that  conformers  with  a  syn- 
periplanar  orientation  of  the  groups  C=0  and 
O — H  are  35-40  kj/mol  more  stable  than  are 
those  with  the  groups  C=0  and  O — H  in  anti - 
periplanar  orientation  [23].  In  view  of  this,  the 
present  study  was  focused  on  conformers  with  a 
syn-periplanar  arrangement  of  the  COOH  group; 
only  two  conformers  with  an  anti- periplanar 
COOH  group  were  included,  namely,  those  that 
form  an  intramolecular  N  *H — O  hydrogen 
bond. 

The  core  of  this  work  is  a  description  of  these 
energetically  low-lying  parts  of  the  PES  in  terms  of 
local  minima  and  reaction  paths  as  obtained  via 
RHF  [24]  calculations  with  the  standard  basis  set 
6-311+  +G**  [25-27].  In  addition,  the  PES  was 
searched  for  local  minima  by  molecular  mechanics 
calculations  using  the  MM3(94)  force  field  [28]. 
This  is  of  special  interest  because  the  torsion  pa¬ 


rameter  N — C — C — O  (type  8-3-1-75),  which  is 
important  for  the  description  of  the  position  of  the 
carboxyl  group  relative  to  the  amino  group  in 
neutral  amino  acids,  is  missing  in  the  MM3(94) 
parameter  set. 


Computational  Details 

At  the  ab  initio  level,  all  stationary  geometries 
were  fully  optimized  to  remaining  maximum  and 
root  mean  square  (rms)  gradients  less  than  1.0  X 
10" 4  and  0.33  X  10" 4  H/Bohr,  respectively.  The 
nature  of  all  stationary  geometries  was  confirmed 
by  calculating  the  eigenvalues  of  the  Hessian  ma¬ 
trix;  according  to  the  computational  resources 
available  to  the  authors,  the  Hessian  matrix  was 
computed  by  double  numerical  differentiation  of 
analytical  first  derivatives.  The  program  GAMESS 
[29]  was  used  for  the  ab  initio  calculations  on  a 
number  of  machines. 

The  MM3(94)  force-field  program  was  used  on  a 
DEC  Alpha  3000-900  workstation.  The  parameter 
sets  implemented  in  the  program  are  dated  10-Jan- 
94.  In  addition,  the  hydrogen-bond  parameters 
(type  23-77)  were  added  with  the  MM3(96)  values 
of  €  —  1.1  kcal/mol  and  r  —  2.39  A  [30].  To  find 
the  local  minima,  the  stochastic  search  routine  was 
used;  starting  from  the  located  minima,  the  poten¬ 
tial  energy  curves  were  traced  with  the  dihedral 
driver  by  block  diagonal  Newton-Raphson  mini¬ 
mization.  The  energy  gradients  were  less  than  1  X 
10  "5  kcal/mol.  For  the  characterization  of  local 
minima  and  transition  states,  the  vibrational  spec¬ 
tra  were  calculated  with  the  full  Newton-Raphson 
method.  Structures  with  "restricted  internal  rota¬ 
tions"  below  20  cm"1  were  ignored  because  of  the 
flat  nature  of  the  PES  in  the  respective  region.  The 
atom  labeling  used  throughout  this  work  is  de¬ 
fined  in  Figure  1. 


H14H15 

H13  \/  H16 

\  / 

H12— C8  C6— H17 

\  /  .09 
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•C4' 
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FIGURE  1 .  Definition  of  atom  labels  used  in  this  work. 
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RHF  Results 

Ten  conformers  were  located  in  the  PES  of  neu¬ 
tral  L-proline,  which  are  labeled  1, 2, . . . ,  10  accord¬ 
ing  to  the  energy.  The  geometries  of  these  con- 


formers  are  sketched  in  Figure  2  and  the  geometry 
data  (bond  lengths,  valence,  and  torsion  angles) 
and  the  energies  are  collected  in  Table  I. 

Due  to  the  restrictions  imposed  by  the  pyrroli¬ 
dine  ring,  these  10  conformers  may  be  character¬ 
ized  by  only  four  torsion  angles:  First,  the  torsion 


h2 
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h2 

h2c^c^ch2 

\  / 

N  C’-.TT 

/  \H 

H  o^g=° 
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\  / 
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H  °\S 

H 
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10 

FIGURE  2.  Sketches  and  nomenclature  of  the  L-proline  conformers  discussed  in  this  work. 
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H — O — C  =  0,  which,  as  already  mentioned,  can 
take  values  around  0°  and  180°;  second,  one  of  the 
torsions  N — C — C — O,  which  define  the  orienta¬ 
tion  of  the  carboxyl  group  relative  to  the  ring; 
third,  one  of  the  torsions  N2 — C3 — C6 — C7  or 
C3 — C6 — C7 — C8  to  describe  the  pucker-up  or 
pucker-down  orientation  of  the  pyrrolidine  ring 

[31]  (which  are  also  known  as  Cy-exo  and  Cy-endo 

[32] ,  or  N  and  S  [33],  respectively);  and,  finally,  the 
orientation  of  the  amino  group  hydrogen  atom 
that  may  occupy  two  different  positions  relative  to 
the  substituents  of  the  asymmetric  carbon  atom 
C3. 

The  6-311+  +G**  results  given  here  were  ob¬ 
tained  in  a  stepwise  fashion:  First,  the  PES  was 
searched  with  a  small  split-valence  basis  set  (4- 
31G).  The  results  obtained  in  this  search  were  then 
used  as  initial  data  in  6-31G*  calculations,  and  the 
results  of  these  were  used  as  the  initial  values  for 
the  6-311  +  +  G*  *  calculations.  This  stepwise  tech¬ 
nique,  the  primary  intent  of  which  was  an  eco¬ 
nomic  use  of  computational  facilities,  revealed 
several  interesting  basis-set  dependencies.  One  of 
these  regards  the  conformer  pair  1/2:  With  the 
4-31G  basis  set,  the  relative  stabilities  of  1  and  2 
are  interchanged,  i.e.,  2  is  the  global  minimum 
with  this  basis  set  and  1  has  a  relative  energy  of 
1.497  kj/mol.  With  the  6-31G*  basis  set,  1  is  more 
stable  by  1.614  kj /mol,  and  with  the  6-311  +  +  G*  * 
basis  set,  1  is  more  stable  by  2.264  kj/mol.  This 
change  in  relative  energy  is  accompanied  by 
changes  in  the  optimized  structures;  the  largest 
deviations  occur  for  the  torsion  angle 
HI — N2 — C3 — C4,  which  has  the  following  opti¬ 
mized  values  in  1:  -26.15°  with  4-31G,  —21.96° 
with  6-31G*,  and  -16.57°  with  6-311+  +  G**. 
The  vibration  spectra  analysis  of  proline  and  hy- 
droxyproline,  which  was  mentioned  above,  is 
based  on  4-21G-optimized  geometries;  in  the  case 
of  proline,  an  optimized  value  of  —41.38°  is  re¬ 
ported  [11]  for  the  HI — N2 — C3 — C4  torsion.  Since 
these  values  show  a  remarkable  trend,  we  also 
optimized  the  structures  of  1  and  2  with  several 
other  standard  basis  sets  to  check  the  adequacy  of 
our  basis-set  choice.  The  resulting  values  for  the 
torsion  HI — N2 — C3 — C4  and  the  energy  differ¬ 
ence  between  conformers  1  and  2  are  displayed  in 
Figure  3.  These  values  show  clearly  that  in  the 
present  case  the  inclusion  of  diffuse  functions  is  of 
minor  importance  for  the  energetics,  but  of  signifi¬ 
cant  importance  for  structural  details. 

Another  type  of  basis-set  dependence  occurs  for 
conformer  8:  This  conformer  is  a  true  local  mini- 
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mum  in  the  4-31G  PES,  but  a  stationary  point  of 
inflection  with  the  basis  sets  6-31G*  and  6-311  +  + 
G*  * .  Such  stationary  points  can  be  considered  as 
local  minima  that  have  one  reaction  path  without  a 
potential  barrier  [34];  in  the  case  of  8,  it  is  the 
reaction  leading  to  5. 

Still  another  type  of  basis-set  dependence  con¬ 
cerns  the  conformation  with  torsion  angles 
HI— N2— C3— C4  =  -  42.95°,  N2— C3— C4— 
05  =  44.37°,  N2 — C3 — C6 — C7  =  0.57°,  and 
09=C4— 05— Hll  =  0.12°.  In  the  4-31G  PES, 
this  conformation  is  a  true  local  minimum  (11) 
with  distinct  saddle  points  in  reaction  paths  to  1,  6, 
7,  and  8;  with  either  6-31G*  or  6-311+  +G**, 
however,  no  local  minimum  exists  with  a  similar 


conformation.  All  geometry  optimizations  starting 
in  that  region  of  the  PES  converge  to  6.  Calcula¬ 
tions  including  electron  correlation  via 
Moller-Plesset  second-order  perturbation  theory 
show  the  same  behavior:  Two  local  minima  (6  and 
11)  exist  with  the  4-31G  basis  set,  but  only  one  (6) 
with  the  6-31G*  and  the  6-311  +  +G*  *  basis  set. 

The  reaction  paths,  which  interconnect  the  10 
local  minima  described  above,  are  listed  in  Table 
II.  In  all  reaction  paths,  except  those  that  involve 
an  internal  rotation  of  the  — OH  group,  the  inter¬ 
nal  rotation  of  the  — COOH  group  is  coupled 
either  with  an  inversion  of  the  amino  group  or 
with  a  ring  pucker  motion.  Figures  4  and  5  show 
characteristic  examples  of  this  coupling,  which  is 


FIGURE  3.  Basis-set  dependence  of  the  optimized  value  of  the  HI—  N2— C3— C4  torsion  in  conformer  1  (top)  and 
the  energy  difference  between  conformers  2  and  1  (bottom).  Values  for  basis  sets  with  an  identical  number  of 
polarization  and  diffuse  functions  are  connected  by  straight  lines;  basis  sets  are  ordered  (on  a  nonlinear  scale) 
according  to  decreasing  RHF  energy. 
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TABLE  II _ 

Reaction  paths  and  potential  barriers  (kJ  /  mol) 
in  the  RHF/  6-311  +  +G**  PES  of  L-proline. 


Initial 

conformer 

Final 

conformer 

Reaction  path 
description 

Barrier 

1 

2 

e 

4.11 

1 

6 

c  and  e 

13.29 

2 

1 

e 

1.84 

2 

5 

c  and  f 

11.42 

2 

9 

d  and  f 

13.76 

3 

4 

e 

8.17 

3 

5 

a  and  d 

44.96 

3 

9 

b  and  c 

45.66 

4 

3 

e 

6.87 

4 

10 

b  and  c 

44.80 

4 

8 

a  and  d 

46.58 

5 

2 

d  and  f 

7.15 

5 

3 

a  and  d 

41.74 

5 

6 

f 

13.52 

5 

8 

e 

3.52 

6 

1 

c  and  e 

4.94 

6 

5 

f 

11.52 

6 

7 

c  and  e 

0.79 

6 

8 

e  and  f 

13.83 

6 

9 

c  and  f 

13.74 

7 

10 

f 

14.48 

7 

6 

d  and  e 

0.56 

8 

4 

b  and  c 

42.05 

8 

5 

e 

— 

8 

6 

e  and  f 

13.20 

9 

2 

c  and  f 

6.82 

9 

3 

a  and  d 

41.94 

9 

6 

d  and  f 

12.89 

9 

10 

e 

6.94 

10 

4 

a  and  d 

38.56 

10 

7 

d  and  f 

12.37 

10 

9 

e 

5.45 

Internal  rotations  are  labeled  as  follows — a:  decrease  of 
torsion  H1 1  —  05— C4— C3;  b:  increase  of  torsion  H1 1—05 
— C4 — C3;  c:  decrease  of  torsion  N2 — C3— C4 — 05;  d\ 
increase  of  torsion  N2 — C3 — C4 — 05;  e:  ring  pucker  mo¬ 
tion;  f :  amino  group  inversion. 


caused  by  attractive  interactions  N — H*--0  =  C 
(1  and  2)  and  N — H  -  O — C  (6  and  7).  The 
H  •  •  O  distances  are  2.391,  2.353,  2.379,  and  2.472 
A  in  1,  2,  6,  and  7,  respectively. 

The  bond  orders  [35]  of  these  N — H-  -  O  inter' 
actions  are  less  than  0.01  in  all  four  cases,  nor  do 
the  calculated  N — H  vibration  frequencies  show  a 
significant  deviation  pattern;  the  N — H *0  in¬ 
teractions,  hence,  cannot  be  classified  as  true  hy¬ 
drogen  bonds.  They  are,  however,  strong  enough 
to  prevent  the  existence  of  four  analogous  con- 


formers  with  an  inverted  amino  group  and  thus 
explain  the  difference  between  the  actual  number 
of  conformers  with  syn-periplanar  orientation  of 
the  groups  C=0  and  O — H,  which  is  8,  and  an 
estimate  based  on  the  number  of  degrees  of  free¬ 
dom  for  the  individual  torsions  (two  for  pucker¬ 
ing,  two  for  amino  group  inversion,  three  for  inter¬ 
nal  rotation  of  the  — COOH  group),  which  antici¬ 
pates  2  X  2  X  3  =  12  structures. 


MM3  Results 

The  torsional  energy  of  the  MM3  force  field, 
which  turned  out  to  be  the  largest  energy  compo¬ 
nent  for  neutral  L-proline,  is  defined  as  a  sum  of 
terms: 

Vx  V2 

E  =  —  (l  +  cos  (o )  +  — (1  -  cos2 a)) 

i  2  2 

V3 

+  —  (1  +  cos3ca), 

with  o)  being  the  dihedral  angle  for  four  bonded 
atoms  A — B — C — D  and  Vv  V2,  and  V3  being  the 
first-,  second-,  and  third-order  torsional  constant, 
respectively.  Similar  expressions  are  also  used  in 
other  force  fields  (e.g.,  MMFF94  [36]  and  MSX 
[37]);  hence,  the  following  discussion  is  of  rele¬ 
vance  beyond  the  specific  case  of  MM3(94).  The 
nitrogen  atom  of  the  pyrrolidine  ring  is  in  a  py¬ 
ramidal  conformation,  so  the  torsional  angle  N — C 
— C — O  is  of  type  8-1-3-75  in  the  MM3  nomencla¬ 
ture.  This  torsional  parameter  type  is  not  present 
in  amides  or  peptides  [38-40],  and,  obviously,  for 
that  reason,  MM3(94)  has  not  yet  been 
parametrized  for  this  type  of  torsion. 

Different  sets  of  parameters  were  used  to  locate 
the  minima  of  the  proline  PES.  The  parameter  sets 
used  are  collected  in  Table  III,  and  the  resulting 
positions  of  the  local  minima  are  shown  in  Figures 
6  and  7.  The  first  two  parameter  sets,  labeled  I  and 
II  in  the  following,  were  straightforward  attempts 
based  on  the  strategy  that  the  interesting  parame¬ 
ter  can  be  compared  with  other  parameters  for  the 
same  central  atoms  A — B— C — D,  because  the  na¬ 
ture  of  terminal  atoms  A  and  D  is  considered  to  be 
of  minor  influence  in  building  a  model  compound 
for  force  field  parametrization  [41].  The  types  9-1- 
3-75  [i.e.,  N(sp2)— C— C— O],  8-1-3-9  [i.e.,  N— C 
— C— N(sp2)],  and  8-1-3-7  (i.e.,  N— C— C=0) 
have  all  parameters  equal  to  zero,  so  V1  =  V2  =  V3 
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FIGURE  4.  The  inversion  of  the  amino  group  is  coupled  with  the  internal  rotation  of  the  —  COOH  group  through 
intramolecular  N  —  H  **•  O  interactions  in  1  and  7  as  well  as  6  and  2,  as  shown  here  for  the  reactions  involving  the  latter 
pair.  (O)  Local  minima;  (•)  transition  states. 


=  0  was  our  choice  for  parameter  set  I.  Set  II  was 
chosen  from  the  parameters  for  nitrogen  atoms 
with  similar  behavior,  namely,  sp 3  hybridization, 
so  it  employs  the  data  for  type  39-1-3-7  (i.e., 
N+ — C — C=0)  [42].  Parameter  set  III  is  based 
on  the  estimation  routine,  which  is  implemented 
in  MM3(94)  for  undefined  parameters  [43].  This 
routine  performs  ad  hoc  comparisons  with  tor¬ 
sional  barriers  of  fragments  with  the  same  central 
atoms  B — C  and  similar  geometrical  behavior. 

Eight  minima  with  syn-periplanar  groups 
C  =  0  and  O — H  could  be  located  with  all  of 
these  three  parameter  sets,  whereby  the  parameter 
choice  had  almost  no  effect  on  the  optimized  struc¬ 
tures  in  most  cases.  Comparison  of  these  local 
minima  with  the  ab  initio  conformers  shows  that 
only  two  of  them  are  similar  in  structure.  These 
comparable  structures  are  1  and  8.  For  8,  the 
MM3-optimized  H — N — C — C  dihedral  angle  is 
-  71.7°,  which  is  16°  larger  than  the  ab  initio  value; 
for  1,  the  N — C — C — O  torsion  angles  differ  by 
20°  and  the  H — N — C — C  torsions  differ  by  8°; 


N2-C3-C4-05  — 

FIGURE  5.  The  ring-puckering  motion  is  coupled  with 
the  internal  rotation  of  the  —COOH  group  in  many 
reaction  paths,  as  shown  here  for  the  reactions  7^6 
^  1.  (O)  Local  minima;  (•)  transition  states. 


and  the  other  torsional  parameters  differ  less  than 
3°  in  both  cases.  With  all  three  parameter  sets,  the 
global  minimum  has  a  geometry  with  dihedral 
angles  H  — N  —  C  —  C  =  -  39.7°  and 
N — C — C-70  ~  0°  (70.7°  with  parameter  set  I,  69.5° 
with  set  II,  and  68.9°  with  set  III),  the  pyrrolidine 
ring  being  in  a  pucker-down  orientation.  This  ge¬ 
ometry  is  somehow  similar  to  the  ab  initio  con- 
former  6,  but  the  N — C — C — O  torsion  is  about 
35°  too  large  with  all  three  parameter  sets;  thus, 
the  global  minimum  is  not  really  comparable  to  6. 
The  other  geometries  are  not  comparable  with  any 
of  the  ab  initio  structures,  as  documented  in  Fig¬ 
ure  6. 

Since  these  straightforward  parameter  guesses 
are  all  equally  unable  to  describe  at  least  the 
global  minimum  correctly,  the  necessity  of  a  more 
elaborate  estimate  for  the  parameter  N — C — C — O 
on  the  basis  of  ab  initio  results  is  obvious.  In  the 
case  of  proline,  this  also  poses  problems,  because 
the  internal  rotations  of  the  carboxyl  group  are  all 
coupled  with  amino  group  inversions  or  ring 
pucker  motions;  hence,  a  "pure"  N — C — C — O 
rotation  profile  cannot  be  extracted.  Calculating 
the  missing  torsional  parameter  via  least  square  fit 
of  energy  profiles  of  a  model  compound  contain¬ 
ing  this  specific  torsional  angle  is  a  well-known 
method  [41].  The  6-31G*  *  basis  set  is  regarded  to 
be  sufficient  to  fit  the  MM3  force-field  geometries 
[44].  Since  the  smallest  neutral  amino  acid  glycine 
is  not  suitable  because  of  its  symmetrical  energy 
profiles,  the  energy  profiles  of  the  pure  N — C — C 
— O  rotation  of  alanine  [45]  between  conformers  1, 
4,  5,  6,  7,  8,  and  9  of  [7]  were  used  to  fit  the  data  of 
parameter  set  IV,  which  are  listed  in  Table  III. 
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TABLE  III _ 

Torsional  constants  (kcal  /  mol)  used  in  the  MM3 
calculations  for  torsion  type  8-1  -3-75. 


Parameter 

set 

v2 

Yb 

1 

0.000 

0.000 

0.000 

II 

0.000 

0.000 

0.100 

III 

0.000 

0.160 

0.090 

IV 

0.398 

3.138 

0.927 

With  this  parameter  set  IV,  we  were  able  to 
locate  12  local  minima  with  syn- periplanar  orienta¬ 
tion  of  the  groups  C=0  and  O — H,  the  positions 
of  which  are  shown  in  Figure  7.  The  structure  of 
MM3  global  minimum  agrees  well  with  that  of  the 
RHF  global  minimum,  and  there  are  more  struc¬ 
tures  that  are  similar  to  RHF  conformers  within  a 
25°  tolerance  in  the  torsion  angles:  Similarities 
occur  for  conformers  1,  2,  6,  7,  8,  and  10.  In  the 
case  of  2,  the  MM3  structure  has,  however,  an 
almost  planar  ring,  and  in  the  case  of  6,  there  are 
two  MM3  structures  that  differ  in  the 
N2 — C3 — C6 — C7  torsion  by  approximately  15°. 


The  remaining  six  minima,  however,  do  not  match 
with  RHF  conformers  and  also  the  coupling  of 
internal  coordinates  in  many  reaction  paths  is  not 
reproduced  with  this  force  field.  Regarding  the 
hydrogen-bonded  conformers  3  and  4,  parameter 
set  IV  is  capable  of  locating  one  structure  with  a  N 
— H  distance  of  2.18  A,  which  deviates  from  3  in 
the  N — C — C — O  torsion  about  50°.  Hence,  pa¬ 
rameter  set  IV  also  does  not  lead  to  respectable 
results. 


Discussion 

One  item  of  interest  is  certainly  the  intramolecu¬ 
lar  N---H — O  hydrogen  bond  in  conformers  3 
and  4.  According  to  the  RHF/6-311  +  +G*  *  re¬ 
sults,  the  distances  of  these  hydrogen  bonds  are 
2.003  A  in  3  and  2.032  A  in  4,  with  bond  orders  of 
0.019  and  0.030,  respectively.  These  data  give  an 
unclear  picture  of  the  relative  strength  of  the  hy¬ 
drogen  bonds:  According  to  the  distance,  the  H 
bond  in  3  is  stronger,  whereas  according  to  the 
bond  order,  the  one  in  4  is  stronger.  Elongation  of 
the  O — H  bond  and  change  in  the  calculated  O — H 
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FIGURE  6.  Positions  of  the  local  minima  with  syn-periplanar  groups  C=0  and  O — H  according  to  RHF/ 6-31  +  +G** 
and  MM3(94)  calculations  with  parameter  sets  I,  II,  and  III.  The  pucker-up  positions  are  labeled  (®)  parameter  set  I,  (O) 
set  II,  (x)  set  III,  and  (■)  RHF;  the  pucker-down  positions  are  labeled  (e)  set  I,  (O)  set  II,  (+)  set  III,  and  (m)  RHF.  The 
positions  of  the  global  minima  are  N2 — C3 — C4 — 05  =  174.4°,  HI — N2 — C3 — C4  =  -16.57°  (RHF),  and 
N2 — C3 — C4 — 05  =  70°,  and  HI— N2 — C3 — C4  =  -39.7°  (MM3,  parameter  sets  I,  II,  and  III). 
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N2-C3-C4-05 


FIGURE  7.  Positions  of  the  local  minima  with  syn-periplanar  groups  C=0  and  0— H  according  to  RHF/  6-311  +  + 
G**  and  MM3(94)  calculations  with  parameter  set  IV:  (•)  MM3;  (■)  RHF,  ring  pucker-up;  (O)  MM3;  (m)  RHF,  ring 
pucker-down.  The  positions  of  the  global  minima  are  N2 — C3 — C4— 05  =  174.4°,  HI — N2 — C3— C4  =  -16.57° 
(RHF),  N2 — C3 — C4 — 05  =  -176.8°,  and  HI— N2  — C3— C4  =  -9.9°  (MM3). 


frequency  (3991.8  cm-1  in  3  and  4004.1  cm-1  in  4, 
which  compare  to  a  mean  value  of  4118.0  cm"1  for 
conformers  1,  2,  5,  6,  7,  8,  9,  and  10)  both  indicate 
the  H  bond  in  3  to  be  stronger.  A  different  judg¬ 
ment  of  H-bond  strength  according  to  various  cri¬ 
teria  was  observed  earlier  for  <o-amino  acids  [46, 
47]  and  co-hydroxy  acids  [48].  However,  in  these 
cases,  the  H-bonded  cycles  were  eight-  or  nine- 
membered  and,  hence,  sterically  much  more  com¬ 
plex  than  in  the  present  case.  The  present  case  of  a 
five-membered,  almost  planar  cycle  closed  by  the 
H  bond  is,  however,  restricted  by  the  pyrrolidine 
ring  of  the  proline  molecule.  This  restriction  is, 
e.g.,  manifested  by  the  potential  barriers  of  the 
ring  pucker  motion,  which  are  highest  in  the  pair  3 
and  4  (cf.  Table  II)  and  is  the  most  likely  reason  for 
the  distance/bond-order  discrepancy  in  proline 
since  a  possible  electronic  interaction  C7 — H  -O 
in  3  can  be  ruled  out  due  to  the  distance  of  3.4  A 
between  these  atoms.  A  comparison  with  those 
forms  of  glycine  and  a-alanine,  which  contain  the 
same  H-bonded  cycle,  should  give  more  insight. 
Geometries  of  glycine  and  a-alanine,  which  were 
optimized  at  RHF  and  MP  levels  with  the  same 
basis  set  as  used  in  the  present  study,  are  de¬ 
scribed  in  [6];  unfortunately,  the  distances  between 
the  hydrogen-bonded  atoms  are  not  listed  in  that 
reference. 


A  repetition  of  these  optimizations  gave  the 
following  results  for  the  intramolecular 
N---H — O  hydrogen  bond:  in  glycine,  a  distance 
of  2.063  A  (the  same  value  as  obtained  by  Hu  et  al. 
with  a  TZ2P  basis  set  [4])  and  a  bond  order  of 
0.019;  and  in  alanine,  a  distance  of  2.030  A  and  a 
bond  order  of  0.023.  In  these  data,  a  bond  length 
decrease  is  accompanied  by  an  increase  of  bond 
order,  so  these  two  measures  for  H-bond  strength 
agree  well.  However,  the  stronger  bond  occurs  for 
alanine,  not  or  glycine,  although  glycine  definitely 
has  minimal  sterical  strain.  The  increase  of  H-bond 
strength  when  going  from  glycine  to  a-alanine 
may  be  interpreted  either  as  an  electronic  effect  of 
the  methyl  group  onto  the  lone  pair  of  the  nitrogen 
atom  or  as  a  steric  effect  of  pushing  the  — COOH 
group  away  from  the  side  chain  and  thus  closer  to 
the  nitrogen  atom.  In  view  of  the  data  for  3  and  4, 
it  appears  that  the  explanation  as  a  steric  effect  is 
correct,  because  in  4,  the  CH2 — C=0  fragment  is 
in  a  staggered  orientation,  whereas  in  3,  it  is 
eclipsed,  which  adds  another  push  to  the  —COOH 
group. 

A  different  topic  that  deserves  discussion  is  the 
comparison  of  the  amino  acid  as  an  isolated 
molecule  with  the  amino  acid  as  the  building  block 
of  peptides  and  proteins.  Such  a  comparison  can 
nicely  be  done  for  proline,  because  a  theoretical 
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structure  investigation  of  N-acetyl-L-proline  amide 
has  been  published  recently  [49].  A  three-letter 
code  xYz  was  used  to  label  the  amide  conformers, 
in  which  x  =  c,  t  denotes  the  orientation  of  the 
acetyl  group  ( cis  vs.  trans),  X  =  A,  B,C,F  de¬ 
scribes  the  orientation  of  the — CONH2  group  rela¬ 
tive  to  the  pyrrolidine  ring  in  the  notation  intro¬ 
duced  by  Zimmerman  et  al.  [50],  and  z  =  u,  d 
indicates  the  orientations  pucker-up  and  pucker- 
down.  In  the  notation  of  [50],  A  characterizes 
peptide  conformations  with  -110 °  <  (p  <  -40° 
and  —  90°  <  ip  <  - 10°,  B  characterizes  conforma¬ 
tions  with  -180°  <  cp  <  -110°  and  -40°  <  ip  < 
20°  or  -110°  <  <j>  <  -40°  and  -10°  <  ip  <  50°,  C 
characterizes  conformations  with  -110°  <  <p  < 
—  40°  and  50°  <  ip  <  130°,  and  F  characterizes  con¬ 
formations  with  -110°  <  <p  <  -40°  and  ip  >  130° 
or  ip  <  - 140°. 

In  comparing  the  results  of  the  free  acid  with 
those  of  the  amide,  similarities  as  well  as  differ¬ 
ences  can  be  noted.  The  similarities  are  mostly 
caused  by  the  five-membered  pyrrolidine  ring  and 
its  ability  to  adopt  a  pucker-up  and  a  pucker-down 
orientation  and  by  the  relatively  low  barrier  for  the 
up  ^  down  interconversion.  In  both  systems,  there 
are  conformer  pairs  Fd/Fu,  Cd/Cu,  and  Ad /An 
and  a  conformer  Au  that  has  no  matching  Ad.  In 
addition,  for  the  free  acid,  there  is  a  conformer  Bd 
(6)  without  a  matching  Bu  in  the  6-311+  +G** 
PES.  The  most  obvious  differences  are  related  to 
the  amino  group  substituent:  In  the  amide,  the 
acetyl  group  is  present  either  in  cis  or  trans  orien¬ 
tation  and  the  substitution  pattern  of  the  nitrogen 
atom  is  planar;  in  the  free  acid,  the  character  of  the 
nitrogen  atom  is  pyramidal,  as  already  mentioned. 

In  the  amide,  the  pucker-down  conformer  is 
always  more  stable  than  is  the  pucker-up  analog, 
with  (6-31G*)  energy  differences  of  3.7,  4.6,  and 
8.0  kj/ mol;  in  the  free  acid,  this  is  also  true  except 
for  the  global  minimum  1,  which  is  more  stable 
than  is  the  corresponding  pucker-down  conformer 
2.  The  energy  difference  between  pucker-up  and 
pucker-down  is  significantly  smaller  in  the  free 
acid,  with  a  maximum  value  of  2.6  kj /mol.  Inspec¬ 
tion  of  the  molecular  structures  shows  that  the 
higher  pucker-up-pucker-down  energy  differences 
in  the  amide  are  caused  by  repulsive  steric  effects 
due  to  the  acetyl  group,  which  is  much  more 
space-filling  than  is  the  hydrogen  atom  in  the  free 
acid.  The  largest  energy  difference,  8.  0  kj/mol, 
occurs  between  the  amide  conformers  tCd  and 
tCu,  which  are  lowest  in  absolute  energy  because 
they  are  stabilized  by  a  y-turn  N — H*--0=C 


hydrogen  bond  that  is  formed  between  the  amide 
and  the  acetyl  group.  This  y-turn  hydrogen  bond 
is  characterized  by  a  bond  order  of  more  than  0.04 
and  results  in  a  considerable  stabilization;  e.g.,  the 
(6-31G*)  potential  barriers  of  those  reactions, 
which  break  this  hydrogen  bond,  are  55.6,  75.9, 
78.3,  83.0,  90.2,  and  97.0  kj /mol  in  the  case  of  tCd. 
As  a  consequence  of  this  hydrogen  bond,  the  frag¬ 
ment  H2C — CO — N — CH2  is  forced  into  an  unfa¬ 
vorable  orientation:  In  tCd,  the  hydrogen  atoms  in 
this  fragment  are  orientated  next  to  each  other  in 
an  eclipsed  position,  whereas  in  tCu,  one  of  the 
ring  hydrogen  atoms  points  almost  directly  toward 
one  of  the  acetyl  group  hydrogen  atoms.  The  en¬ 
ergy  difference  of  8.0  kj/mol  between  tCd  and  tCu 
therefore  is  not  only,  to  a  minor  extent,  an  effect  of 
the  pyrrolidine  ring  puckering,  but,  rather,  also  a 
consequence  of  the  y-turn  hydrogen  bond. 

Another  interesting  aspect  of  the  comparison 
between  free  acid  and  amide  is  the  energetical 
order  of  the  conformers  as  a  function  of  the  orien¬ 
tation  of  the  — CO — R  group.  In  the  amide,  this 
order  is  C  <  B  ~  A  <  F;  in  the  free  acid,  it  is 
F  <  C  ~  B  ~  A.  Comparing  these  orders,  one  notes 
that  conformation  C  is  more  stable  in  the  amide, 
which  is  a  direct  consequence  of  the  stabilizing 
y-turn  N — H-**0=C  hydrogen  bond  discussed 
above.  One  also  notes  that  conformation  F  is 
shifted  to  significantly  lower  energies  in  the  free 
acid:  In  the  amide,  the  relative  (6-31G*)  energies 
of  conformers  cFd  and  cFu  are  24.008  and  27.703 
kj/mol,  respectively,  whereas  in  the  free  acid,  the 
Fw-conformation  (1)  is  the  global  minimum  and  Fd 
(2)  has  a  relative  energy  of  2.264  kj /mol.  This  shift 
in  energy  (conformers  highest  in  energy  vs.  con¬ 
formers  of  lowest  energy)  is  much  too  large  to  be  a 
consequence  of  the  weak  N — H‘--0=C  interac¬ 
tion  in  1  and  2.  There  also  is  no  steric  effect  of  the 
CH3  terminal  of  the  acetyl  group,  as  the  corre¬ 
sponding  N-formylproline  amide  conformers  show 
that  they  do  not  have  this  terminal  CH3  group: 
The  relative  (6-31G*)  energies  are  21.931  kj/mol 
for  cFd  and  23.810  kj/mol  for  cFu  in  this  case  [51]. 
Hence,  this  shift  in  energy  can  only  be  ascribed  to 
the  orientation  of  the  C  =  0  group,  which  corre¬ 
sponds  to  the  peptide  torsion  co  =  0°  in  cFd  and 
cFu  and  also  in  cBd  and  cAu.  The  latter  pair  does 
not  show  such  a  dramatic  shift  in  energy,  which  is 
explainable  by  the  attractive  N--H — N — CO  in¬ 
teraction  that  is  present  in  these  two  conformers. 
This  interaction  is  of  similar  strength  as  the 
N — H-**0=C  interaction  in  the  F  conformation 
of  the  free  acid.  In  total,  these  individual  effects 
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give  the  following  picture:  The  0°  orientation  of  the 
central  peptide  bond  torsion  co  is  considerably 
destabilized;  this  destabilization  is  lowered  in  con¬ 
formation  A  and  B  by  attractive  interactions  in 
the  amide  and  amplified  in  the  F  orientation  by 
attractive  interactions  in  the  free  acid.  A  quantita¬ 
tive  treatment  would  be  highly  desirable  but  can¬ 
not  be  performed  in  a  straightforward  manner, 
because  the  absolute  RHF  energies  of  the  two 
systems  are  not  comparable. 


Summary 

The  energetically  low-lying  parts  of  the  PES  of 
neutral  L-proline  were  investigated  by  ab  initio 
RHF  calculations  with  different  basis  sets.  Various 
basis-set  dependencies  could  be  noted  regarding 
the  number  and  the  nature  of  the  local  minima  as 
well  as  a  significant  trend  in  some  structural  pa¬ 
rameters;  the  inclusion  of  diffusion  functions  ap¬ 
pears  to  be  essential  in  this  case. 

Similar  calculations  were  performed  using  the 
MM3(94)  force  field,  which  lacks  constants  for  the 
N — C — C — O  torsion  in  neutral  amino  acids. 
Straightforward  guesses  for  these  constants  turned 
out  to  be  totally  misleading.  A  more  elaborate 
guess  based  on  RHF/6-31G*  data  of  a-alanine 
gave  somehow  more  realistic  results.  This  guess, 
however,  was  also  unable  to  reproduce  certain 
significant  features  of  the  RHF  PES,  so  it  seems 
that  the  behavior  of  the  pyrrolidine  ring  in  L-pro- 
line  cannot  be  sufficiently  described  by  calculating 
only  one  missing  parameter  using  the  open-chain 
molecule  a-alanine. 

The  comparison  of  the  results  obtained  for  the 
free  acid  with  those  of  the  model  dipeptide  com¬ 
pound  N-acetylproline  amide  as  well  as  with  those 
of  a-alanine  and  glycine  gives  interesting  insight 
into  stability  patterns  of  structural  fragments, 
which  cannot  be  gathered  from  either  of  the  indi¬ 
vidual  species. 
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ABSTRACT:  The  integrated  molecular  transform  (FTm)  has  been  used  for  the 
correlation  of  the  structures  of  organic  molecules  with  their  physicochemical, 
thermodynamic,  and  pharmacological  properties;  it  is  also  an  excellent  conformation 
index  and  functions  as  a  discriminator  of  classical  chemical  structure  types.  In  this  study, 
it  is  used  along  with  our  recently  introduced  normalized  molecular  moment  (M„),  and 
new  structure  indices,  viz  the  integrated  electronic  transform  (FTe),  the  integrated  charge 
transform  (FTC),  and  the  electronic  moment  (Me),  to  establish  appropriate  models  for  the 
title  subject.  Initially,  the  principal  absorption  maxima  in  each  of  several  series  were 
regressed  against  the  structural  indices  to  determine  which  index  best  represented  the 
structures  in  the  context  of  the  absorption  data.  The  indices  were  then  selectively 
regressed  against  the  absorption  data  to  generate  absorbance  estimation  equations.  In  a 
series  of  multicyclic  hydrocarbons,  the  FTm  functioned  as  a  topological  structure 
discriminator  as  well  as  a  structure  surrogate.  In  the  topological  subsets,  the  FTC  and  FTC 
also  were  selectively  useful.  For  a  series  of  conjugated  dienes,  the  FTm  and  the  Mn  were 
statistically  appropriate.  In  a  series  of  substituted  benzenes,  the  discrimination  of 
halobenzenes  was  apparent  and  could  be  represented  by  either  the  FTm,  FT*,  or  Me 
indices.  For  other  variously  substituted  benzenes,  the  FTm  is  the  extant  model  and  further 
work  with  larger,  structurally  delineated  series  is  warranted.  For  a  series  of  monoalkyl- 
substituted  nitrobenzenes,  the  FTm  and  FTe  parameters  are  appropriate  variables. 
Satisfactory  correlation  of  molar  absorptivities  was  not  possible  in  this  study  as  it  would 
require  absorption  curve  integration  in  the  range  where  the  maxima  occurs.  ©  1997  John 
Wiley  &  Sons,  Inc.  Int  J  Quant  Chem  65:  1047-1056,  1997 
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Introduction 


The  integrated  molecular  transform  (FT,,,)  is 
derived  from  the  molecular  transform  [/(s)] 
of  Soltzberg  and  Wilkins  [1,  2],  which  in  turn  was 
based  on  the  electron  diffraction  analysis  work  of 
Wierl  [3].  In  this  methodology,  molecular  descrip¬ 
tors  such  as  bond  lengths,  bond  counts,  or  other 
parameters  were  treated  by  a  Fourier  transform 
weighted  by  the  atomic  numbers  of  the  constituent 
atoms  of  a  molecule,  to  give  ordinates  of  a  curve; 
the  curve  was  then  used  to  generate  a  100-digit 
binary  number  which  could  be  used  as  a  structure 
surrogate  in  structure-activity /property  correla¬ 
tion  and  prediction  studies.  In  our  formulation, 
which  utilizes  interatomic  bond  distances  [Eq.  (1)], 
the  ordinates  of  the  I(  s)  curve  are  squared,  the 
area  under  the  resulting  curve  integrated,  and  the 
square  root  of  the  area  taken  as  the  FT,,,,  as  shown 
in  Eq.  (2). 


"  '-1  AjA: 

Ks)  -EE  sin 
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Vi 
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y, 
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i—  2  j= 1  Srij 
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FT„,  =  ^ /i31/2(s)  ds  .  (2) 


A  i  and  Ay  in  Eq.  (1)  are  form  (or  weighting) 
factors.  In  the  FT,,,  case,  they  are  the  atomic  num¬ 
bers  of  the  atoms  between  which  the  coordinate 
distance  was  determined  in  either  a  two-center  or 
multicenter  aspect.  In  this  work,  we  introduce  ad¬ 
ditional  transforms,  i.e.,  the  integrated  molecular 
electronic  transform  (FTC)  in  which  the  weighting 
factors  are  the  calculated  electron  densities  of  the 
respective  atoms  and  the  integrated  charge  trans¬ 
form  (FTC)  in  which  the  weighting  factors  are  the 
calculated  atomic  charges  on  the  respective  atoms. 

We  have  recently  introduced  [4,  5]  another 
structure  index,  the  normalized  molecular  moment 
(M„),  which  is  the  sum  of  the  products  of  the 
atomic  weight  of  each  constituent  atom  multiplied 
by  its  distance  from  the  geometric  center  of  the 
molecule,  divided  by  the  molecular  weight  [Eq. 
(3)].  In  the  present  instance,  we  also  introduce  the 
molecular  electronic  moment  (M„),  which  is  the 
sum  of  the  products  of  the  electron  density  on 
each  atom  multiplied  by  its  distance  from  the 


geometric  center  of  the  molecule  [Eq.  (4)]. 


M„=  E 
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where  M„  is  the  normalized  molecular  moment,  n 
is  the  number  of  atoms  in  the  molecule,  c  -  is  the 
distance  from  the  molecular  geometric  center  to 
atom  j  (A),  fly  is  the  atomic  weight  of  atom  j,  W„, 
is  the  molecular  weight  of  the  molecule,  Mc  is  the 
normalized  electronic  moment,  and,  p(  is  the  cal¬ 
culated  electron  density  of  atom  j. 

In  each  of  these  formulations,  interatomic  dis¬ 
tances  may  be  based  on  standardized  bond  dis¬ 
tances  or  determined  from  structures  optimized  by 
molecular  mechanics  methodology  or  quantum 
chemical  methods  (e.g.,  MOP  AC);  the  latter  also 
serves  to  generate  the  atomic  electron  densities 
and  charges. 

The  unitary  FT,,,  index  has  been  used  to  corre¬ 
late  both  physicochemical  and  pharmacological 
properties,  viz  the  polarizability  and  local  anes¬ 
thetic  activity  in  a  structurally  diverse  series  [6]; 
an  enthalpy  function  and  heats  of  formation  in  a 
series  of  hydrocarbons,  most  of  which  were 
branched  [7];  the  enzyme  inhibition  activity  of 
several  series  of  organophospborus  compounds  [8]; 
the  octanol/water  partition  coefficient  in  other 
organophosphorus  series  where  the  index  was 
shown  also  to  be  a  structure  discriminator  [9];  the 
gas  chromatographic  (GC)  retention  indices  of  a 
series  of  mustard  [  bz's(2-chloroethyl)  sulfide] 
analogs  [10];  pK„  with  structure  as  well  as  func¬ 
tioning  as  a  structure  discriminator  [11];  dermal 
transport  rate  with  classically  delineated  structures 
[12];  as  a  basis  for  a  simple  similarity  index  [5,  13]; 
a  unitary  numerical  descriptor  of  structure  confor¬ 
mation  [14];  and  for  correlation  of  diamagnetic 
susceptibility  of  organic  compounds  in  a  struc¬ 
turally  diverse  series  [15].  The  FT,,,  correlates  well 
with  some  theoretical  linear  solvation  energy  rela¬ 
tionships  (TLSER)  [16].  In  the  noted  examples, 
linear  modeling  was  resulted  in  correlation  coeffi¬ 
cients  of  greater  than  0.9. 

In  considering  additional  molecular  physico¬ 
chemical  attributes  that  might  be  amenable  to 
modeling  with  molecular  transforms,  ultraviolet 
(UV)  spectra  came  to  mind.  This  was  because 
consideration  of  structural  detail  often  permits  rea- 
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sonably  accurate  prediction  of  UV  absorption 
bands  and,  to  an  extent,  molecular  absorptivity. 
The  methodology  developed  for  this  is  generally 
referred  to  as  Woodward's  or  the  Woodward-Fieser 
rules,  and  Scott's  rules  for  special  cases  [17-19]. 
This  suggested  that  our  transform  and  moment 
concepts  might  be  applied  to  the  direct  correlation 
of  molecular  structure  and  UV  spectra.  This  study 
describes  the  results  of  that  application. 


Methodology 

The  structures  investigated  in  this  study,  their 
associated  absorption  data,  and  calculated  indices 
are  shown  in  Tables  I,  III,  V,  and  VII.  The  inter¬ 
atomic  distance  and  electronic  information  neces¬ 
sary  for  calculation  of  the  transform  and  electronic 
moment  indices,  and  distance  to  constituent  atoms 
from  the  geometric  center  of  the  molecules,  as 


required  for  calculation  of  both  moment  indices, 
were  obtained  by  first  drawing  the  structures  in 
ChemDraw  Pro  (v.  3.5).  They  were  then  imported 
into  Chem3D  Pro  (v.  3.5),  which  permitted  opti¬ 
mization  by  MM2  or  MOPAC93  methods  for  gen¬ 
eration  of  the  necessary  distances,  and  electronic 
information  by  the  latter  [20].  The  modeling  and 
statistical  data  were  calculated  by  SYSTAT  [21]  or 
with  a  TI-59  programmable  calculator  using  the 
routines  of  Clark  [22]. 


Results  and  Discussion 

AROMATIC  HYDROCARBONS 

Table  I  shows  the  data  for  19  compounds.  A 
plot  of  the  major  UV  absorption  band  for  these 
versus  their  calculated  FTm  indices  revealed,  by 
observation,  three  distinct  groups,  as  shown  in 


TABLE  I 


Absorption  maxima  [23,  24],  integrated  molecular  transform,  and  normalized  molecular  moment  indices 
for  aromatic  compounds. 


No. 

Compounds 

I;mnx 

(kK)a 

e  b 
cmax 

(L/mol  cm) 
(xio-3) 

FT  c 

1  m 

FT  d 

1  'm 

FT/ 

FT/ 

Mn9 

Mnh 

1 

Benzene 

54.50 

46.00 

130.8006 

132.2525 

64.36861 

0.859055 

1 .482279 

1.480615 

2 

Naphthalene 

45.40 

133.00 

197.1029 

201.1666 

95.36857 

1.14874 

1.986480 

1.982338 

3 

Acenaphthene 

43.80 

93.00 

221.5079 

226.4143 

106.66258 

1 .406341 

2.108119 

2.122805 

4 

Anthracene 

39.00 

180.00 

284.7887 

288.351 

135.77252 

1 .340451 

2.558719 

2.553458 

5 

Naphthracene 

36.70 

190.00 

368.5519 

374.9043 

175.51878 

1 .487942 

3.112254 

3.104805 

6 

Pentacene 

32.30 

300.00 

454.0849 

460.3999 

214.92817 

1 .599764 

3.703059 

3.624926 

7 

Phenanthrene 

39.40 

65.50 

273.3747 

275.7011 

129.76932 

1.356991 

2.46074 

2.459848 

8 

Chrysene 

37.20 

150.00 

354.3647 

370.9624 

173.72862 

1.514039 

2.939143 

2.970897 

9 

1,2-Benzanthracene 

34.80 

113.00 

364.4124 

370.7791 

173.63894 

1.509238 

3.049184 

2.970879 

10 

3,4-Benzanthracene 

35.60 

85.00 

349.1559 

353.8025 

165.97919 

1 .465682 

2.855678 

2.862941 

11 

Triphenylene 

38.90 

150.00 

359.8419 

365.5476 

172.02089 

1.563897 

2.798721 

2.794283 

12 

Azulene 

37.10 

47.00 

176.9720 

180.5548 

85.96730 

1.162454 

2.021417 

2.019807 

13 

1,2-Benzazulene 

33.30 

60.00 

252.1937 

256.3687 

120.82545 

1 .406654 

2.515555 

2.518336 

14 

Biphenyl 

50.50 

52.00 

234.9134 

239.1399 

113.92124 

1.108196 

2.501849 

2.509916 

15 

Fluorene 

48.00 

43.00 

251 .7534 

255.3227 

120.32683 

1 .333206 

2.380029 

2.38081 

16 

2-Phenylnaphthalene  20.00 

65.00 

322.1034 

327.2158 

154.47098 

1.335809 

3.013256 

2.967535 

17 

1,2-Benzfluorene 

38.00 

70.00 

322.1034 

333.6507 

156.17543 

1.519029 

3.013256 

2.854355 

18 

2-Phenylazulene 

33.10 

85.00 

293.1007 

300.8000 

141.98315 

1.542411 

3.043103 

3.111077 

19 

Indenoazulene 

30.90 

64.00 

308.1031 

313.5010 

146.90808 

1 .698667 

2.915621 

2.937072 

a  Absorption  wave  number  in  kiloKaisers  (kK). 
b  Molar  absorptivity. 
c  Structure  optimization  by  MM2. 
d  Structure  optimization  by  MO. 
e  integrated  molecular  electronic  transform. 

'integrated  molecular  charge  transform. 

9,h  Normalized  molecular  moment  with  structure  optimization  by  MM2  and  MO,  respectively. 
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FIGURE  1.  FTm  versus  absorption  wave  number  (in 
kiloKaisers)  for  the  aromatic  compound  series. 


Figure  1.  Consideration  of  the  structures  shows 
topological  similarities  within  each  group  that  are 
not  necessarily  common  to  the  other  groups.  This 
example  again  demonstrates  the  remarkable  power 
of  the  FT„,  to  discriminate  classical  structure. 


Correlation  Analysis 

Table  II  shows  the  results  of  correlating  the 
principal  absorption  band  and  molar  absorptivity 
against  the  various  calculated  structural  indices 
for  each  group.  In  group  1,  the  best  model  for 
absorption  wave  number  (T>)  is  with  the  FTC  fol¬ 
lowed  by  the  FT„„  as  judged  by  the  correlation 
coefficients  and  F  statistics.  It  is  not  obvious  why 
the  FTC  should  stand  out  in  this  manner,  particu¬ 
larly  in  comparison  to  the  models  for  groups  2  and 
3.  However,  this  may  again  reflect  the  discrimina¬ 
tory  power  of  the  transform  paradigm,  i.e.,  the 
group  1  compounds  are  seven-membered  unsatu¬ 
rated  rings  fused  to  other  ring  systems,  and  this 
ring  system  has  a  greater  tendency  to  form  charged 
carbonium  ion  species  than  analogous  six-mem- 
bered  rings. 

In  group  2,  the  FTC,  while  not  a  bad  model  for  v, 
is  inferior  to  all  the  other  indices,  particularly 
those  involving  structure  optimization  by  quan- 


TABLE  II 


Linear  correlation  of  absorption  data  and  molecular  indices  in  Table  I. 


1  iv  max 

F  b 

1  i^max 
o  c 
°i/max 

—  R 

1  lv  max 

max 

max 

—  R 

1  'v  max 

^max 

max 

max 

max 

max 

"""  Rp  max 
max 

max 

—  R 

1  'e  max 

max 

max 


0.959 

Group  1  (compounds  12, 13,  18, 19) 

0.953  0.948  0.975 

0.865 

0.845 

22.728 

19.619 

17.893 

38.506 

5.924 

4.976 

20.450 

22.372 

10.764 

0.062 

0.283 

0.319 

0.770 

0.782 

0.790 

0.651 

0.895 

0.910 

2.907 

3.149 

3.325 

1.474 

8.013 

9.570 

45.905 

45.839 

20.819 

0.211 

0.252 

0.248 

0.986 

Group  2  (compounds  1 , 2,  3,  4,  7) 

0.988  0.987  0.91 2 

0.990 

0.991 

104.853 

126.038 

1 1 1 .080 

14.924 

148.102 

167.138 

1 1 .995 

1 1 .050 

5.376 

0.107 

0.070 

0.066 

0.671 

0.680 

0.678 

0.596 

0.701 

0.638 

2.454 

2.584 

2.562 

1.649 

2.891 

2.057 

53.342 

53.117 

24.347 

0.209 

0.353 

0.380 

0.933 

Group  3  (compounds  5,  6,  8,  9,  10, 
0.938  0.937 

11,  14,  15,  16,  17) 
0.882 

0.845 

0.832 

53.642 

59.090 

57.106 

27.883 

19.902 

17.951 

23.674 

23.082 

10.742 

0.073 

0.205 

0.208 

0.882 

0.878 

0.879 

0.664 

0.829 

0.882 

27.916 

26.859 

27.070 

6.294 

17.519 

28.004 

31.014 

32.021 

14.636 

0.116 

0.214 

0.177 

a  Correlation  coefficient. 
b  F  statistic. 
c  Standard  deviation. 


CORRELATION  OF  ULTRAVIOLET  SPECTRA 


turn  chemical  (MO)  methods  as  compared  to 
molecular  mechanics  (MM2). 

In  group  3,  the  best  models  for  v  are  the  FTm 
and  FTe  with  the  remainder  being  very  obviously 
inferior.  Again,  the  discriminatory  power  of  the 
transforms  is  evident  inasmuch  as  the  compounds 
in  this  group  are  the  most  extensively  conjugated 
of  the  three;  it  would  appear  that  this  characteriza¬ 
tion  is  reflected  in  the  indices. 

In  respect  to  molar  absorptivity  (emax),  none  of 
the  indices  provide  a  good  model  with  the  lone 
exception  of  the  normalized  molecular  moment 
(Mn),  calculated  from  quantum  chemical  parame¬ 
ters,  for  the  group  1  compounds.  In  reality,  emax 
was  not  well  correlated  for  any  series  in  this  study, 
with  the  group  1  results  being  rather  typical.  This 
may  be  due  to  the  fact  that,  while  Tmax  is  usually 
quite  obvious  from  spectra,  emax  involves  a  con¬ 
centration  term  and  its  modeling  should  reflect  the 
area  under  the  absorption  curve  because  the  latter 
may  include  a  convolution  of  several  less  intense 


absorption,  and  perhaps,  broader,  bands.  In  light 
of  these  observations  and  since  actual  spectra  were 
not  available,  emax  correlations  were  not  further 
considered. 

CONJUGATED  DIENES 

Table  III  shows  the  absorption  maxima  and  cal¬ 
culated  molecular  indices  for  11  dienes.  A  plot  (not 
shown)  of  FTm  against  Fmax  shows  a  slight  curva¬ 
ture  due  principally  to  compound  9.  This  molecule 
is  a  bicyclohexenyl  with  the  double  bonds  uncon¬ 
jugated  and  is  thus  slightly  different  from  the 
remaining  compounds  in  this  series. 

Correlation  Analysis 

Table  IV  shows  the  data  for  this  series  with  the 
Mn  being  only  marginally  superior,  according  to 
the  correlation  coefficients  and  F-statistics,  to  FTm 
and  FTC. 


TABLE  III 


Absorption  maxima  [25],  transform,  and  moment  indices  for  conjugated  dienes. 


No. 

Compound 

^max 

(kK) 

FT m 

FTe 

FTC 

Mn 

Me 

1 

1,3-Butadiene 

45.870 

74.30927 

38.62814 

0.56652 

1.38732 

1.51445 

2 

2,3-Dimethyl-1 ,3-butadiene 

44.043 

107.88840 

54.72790 

0.78878 

1 .63076 

1 .78608 

3 

2,4-Hexadiene 

43.849 

114.75564 

60.40481 

0.76457 

2.03457 

2.19651 

4 

2-Methyl-1 ,3-butadiene 

45.244 

91.02438 

46.71057 

0.68095 

1.49105 

1 .62970 

5 

2,4-Pentadiene 

44.536 

93.26402 

48.82182 

0.68052 

1 .70870 

1 .85779 

6 

3-(Cyclohexylidene)  propene 

42.088 

165.70787 

84.10878 

0.89867 

2.25838 

2.38046 

7 

2-(1-Cyclohexenyl)  propene 

42.356 

158.92196 

79.14550 

1.02114 

2.13936 

2.28263 

8 

4-(Methylene)  isopropylcyclohex-2-ene 

42.904 

175.84540 

88.02970 

0.92126 

2.30553 

2.42180 

9 

1 ,3'-Bicyclohexenyl 

42.177 

243.661 84 

122.19467 

1.21796 

2.51875 

2.66278 

10 

1-Cyclohexylidene-2-(4-hydroxycyclohexylidene) 

ethane 

40.136 

261 .29029 

131.93339 

1 .05001 

3.31174 

3.42462 

11 

1 -(Methylene)  cyclohex-2-ene 

43.090 

138.76357 

69.41557 

0.86744 

1 .72769 

1 .85094 

TABLE  IV _ _ 

Linear  correlation  of  absorption  data  and  molecular  indices  from  Table  III. 


^max 


FTm 

FTe 

H° 

LL. 

Mn 

Me 

-  Ra 

0.922 

0.922 

0.859 

0.931 

0.601 

F6 

50.848 

50.684 

25.265 

58.451 

5.092 

Sc 

25.087 

12.416 

0.103 

0.214 

0.560 

a  Correlation  coefficient. 
b  F  statistic. 
c  Standard  deviation. 
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SUBSTITUTED  BENZENES 

Table  V  lists  the  data  for  this  series.  A  plot  (not 
shown)  of  FTm  versus  vmax  was  generally  linear 
but  with  some  clustering  evident  and  with  com¬ 
pounds  8  and  20  being  apparent  outliers,  although, 
as  shown  below,  they  could  be  included  in  se¬ 
lected  correlations.  The  halobenzenes  have  classi¬ 
cally  been  regarded  as  unique  and  this  instance 
was  not  an  exception. 

Correlation  Analysis 

Table  VI  shows  the  data  for  this  series.  Once  the 
decision  was  made  to  group  the  halobenzenes  by 
themselves,  the  correlations  fell  into  place.  The 
model  for  all  compounds  except  8  and  20,  with 
correlation  coefficients  of  -0.735  and  -0.777  for 
the  FTm  and  Mn,  respectively,  and  good  F  statis¬ 
tics,  is  probably  not  sufficiently  accurate  for  pre¬ 
dictive  use.  The  halobenzene  model  is  excellent 
with  Me  and  FTm  being  the  better  and  FTe  next. 


This  order  may  reflect  to  some  extent  the  elec¬ 
tronic  nature  of  the  halogens  in  this  particular 
series  as  indicative  of  their  influence  on  molecular 
behavior.  When  other  substituted  benzenes  are 
added  to  the  model,  as  in  the  third  part  of  the 
table,  the  correlation  coefficients  and  F  statistics 
drop  precipitously  except  for  FTm  with  an  R  value 
.  of  —  0.897  and  F  statistic  of  66.202.  These  statistics 
are  an  indication  of  the  robustness  of  the  FTm  as  a 
unitary  index  of  structure.  It  would  be  reasonable 
to  consider  use  of  this  model  as  a  preliminary 
initial  predictor  of  UV  absorption  bands  for  series 
including  similar  molecules. 

NITROBENZENES 

Table  VII  shows  the  absorption  data  for  this 
series  in  both  the  vapor  phase  and  heptane  solu¬ 
tion.  Again,  the  highly  electronic  nature  of  the 
nitro  substituent,  analogous  to  the  halogens,  sug¬ 
gested  considering  these  molecules  separately.  A 


TABLE  V 


Absorption  maxima  [26]  and  molecular  indices  for  substituted  benzenes. 


emax 

V 

(xicr3) 

No. 

Compound 

(kK) 

(L/mol  cm) 

FTm 

FTe 

FTC 

Mn 

Me 

1 

Benzene 

54.4 

45.0 

132.34663 

64.41509 

0.85874 

1 .48022 

1 .58640 

2 

Hexaethylbenzene 

47.2 

40.0 

238.54664 

119.82010 

1 .706953 

2.83153 

3.01723 

3 

Chlorobenzene 

52.7 

54.0 

136.63966 

62.87805 

0.75424 

1 .92053 

1.83049 

4 

o-Dichlorobenzene 

51.2 

60.0 

148.49378 

61 .95596 

0.69743 

2.15249 

1 .99793 

5 

m-Dichlorobenzene 

51.0 

38.0 

158.62699 

65.43174 

0.66886 

2.18051 

2.02098 

6 

p-Dichlorobenzene 

51.8 

43.0 

166.64601 

66.65557 

0.64518 

2.24461 

2.05141 

7 

1 ,3,5-Trichlorobenzene 

49.4 

50.0 

204.58498 

71 .95591 

0.56605 

2.41777 

2.20634 

8 

Hexachlorobenzene 

46.0 

90.0 

422.80023 

100.28110 

0.53118 

2.66803 

2.47292 

9 

Bromobenzene 

52.4 

36.0 

140.58889 

60.02977 

0.88729 

2.24999 

1 .85583 

10 

lodobenzene 

43.9 

40.0 

155.67567 

58.57964 

1.07213 

2.47846 

1 .87564 

11 

Phenol 

52.7 

50.0 

149.42181 

72.45640 

0.79005 

1.67302 

1.77013 

12 

o-Cresol 

52.2 

70.0 

157.98198 

76.31838 

0.89596 

1 .80292 

1.92473 

13 

m-Cresol 

51.8 

60.0 

160.16019 

77.82347 

0.89545 

1 .86329 

1.98119 

14 

p- Cresol 

52.3 

45.0 

162.61700 

79.22689 

0.89353 

1.88121 

2.00826 

15 

Hydroquinone 

52.6 

28.0 

169.14650 

82.2661 1 

0.73114 

1.86901 

1 .96300 

16 

Aniline 

50.8 

32.0 

147.04542 

71 .58923 

0.93166 

1.68716 

1 .79958 

17 

Benzoic  Acid 

51.0 

38.0 

177.86696 

85.33450 

0.60347 

2.01712 

2.10940 

18 

Benzaldehyde 

49.9 

26.0 

154.42676 

74.05752 

0.64463 

1 .89962 

1.98488 

19 

Nitrobenzene 

48.5 

13.0 

183.38723 

89.05759 

0.23473 

2.02554 

2.09886 

20 

Benzonitrile 

52.2 

44.0 

65.47588 

31 .07088 

1.17193 

3.35662 

3.47191 

21 

Phenylacetylene 

50.2 

30.0 

150.64614 

71.60076 

0.87371 

1 .92476 

2.01899 

22 

Styrene 

49.2 

23.0 

155.34296 

75.23977 

0.95890 

1 .86448 

1 .98530 

23 

o-Toluic  acid 

50.5 

35.0 

183.65238 

88.08477 

0.72782 

2.07686 

2.18786 

24 

m-Toluic  acid 

50.0 

40.0 

192.22253 

92.84231 

0.66445 

2.22185 

2.3376 
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TABLE  VI _ 

Linear  correlation  of  absorption  data  and  molecular  indices  from  Table  V. 


^max  V®* 

FTm 

FTe 

FTC 

Mn 

Me 

All  compounds  except  8  and  20 

~Ra 

0.735 

0.631 

0.142 

0.592 

0.777 

Fp 

23.506 

13.265 

0.409 

10.781 

30.422 

sc 

16.956 

10.956 

0.271 

0.249 

0.177 

Halobenzenes  (compounds  3  to  1 0  inclusive) 

-R 

0.956 

0.942 

0.586 

0.801 

0.959 

F 

63.888 

47.239 

3.146 

10.716 

68.368 

s 

30.286 

4.902 

0.155 

0.148 

0.066 

Compounds  3  to  17  inclusive  and  20,  23,  24 

-R 

0.897 

0.534 

0.541 

0.343 

0.260 

F 

66.202 

6.399 

6.610 

2.139 

1.164 

s 

29.054 

13.581 

0.150 

0.393 

0.387 

a  Correlation  coefficient. 
b  F  statistic 
c  Standard  deviation. 


piot  (not  shown)  has  a  little  curvature  which  is 
due  to  nitrobenzene  itself.  In  a  way  this  is  not  too 
surprising  since  the  remaining  five  compounds  are 
substituted  with  aliphatic  groups.  In  fact,  the  mag¬ 
nitude  of  the  FTm  index  increases  with  side  chain 
size,  an  observed  characteristic  of  this  index  which 
is  paralleled  by  the  FTe  data  but  not  the  other 
indices.  Further,  superimposing  the  nitrobenzene 
plot  on  that  for  the  substituted  benzenes  shows 
that  although  they  are  parallel,  they  are  far  enough 
apart  that  they  cannot  be  included  in  the  same 
regression.  Again,  the  structural  discrimination 
power  of  the  FTm  is  evident. 


Correlation  Analysis 

Table  VIII  shows  the  statistical  parameters  for 
the  substituted  nitrobenzenes.  Again,  the  FTm  in¬ 
dex  is  shown  to  be  the  best  model  for  absorbancies 
in  the  vapor  phase  as  well  as  in  solution.  It  is 
followed  closely  by  the  FTe  and  the  Me.  This 
indicates  that,  again,  the  FTm  is  the  most  applica¬ 
ble  general  index  but  also  that  electronic  character¬ 
ization  of  the  molecules  can  be  extracted  by  the 
two  electronic  indices.  It  is  interesting  to  note  that 
the  charge  index  (FTC)  correlation  rules  out  charge 
as  a  factor  in  describing  these  molecules,  at  least  in 


TABLE  VII _ 

Absorption  maxima  [25]  and  molecular  indices  for  nitrobenzenes. 


No. 

Compound 

Vapor 

Phase 

(kK) 

Heptane 

Solution 

(kK) 

FTm 

FTe 

FTC 

Mn 

Me 

1 

Nitrobenzene 

41.630 

39.452 

183.38096 

89.05209 

0.23403 

2.02634 

2.09958 

2 

p-Methylnitrobenzene 

39.783 

37.689 

200.88976 

98.24634 

0.28060 

2.23553 

2.33137 

3 

p-Ethylnitrobenzene 

39.656 

37.561 

218.36509 

108.05060 

0.26936 

2.45262 

2.55669 

4 

p-Propylnitrobenzene 

39.562 

37.491 

238.61608 

119.09394 

0.13905 

2.79221 

2.88679 

5 

p-lsobutylnitrobenzene 

39.421 

37.378 

249.31274 

124.45922 

0.15166 

2.89931 

2.98748 

6 

p-Tertbutylnitrobenzene 

39.312 

37.224 

252.00474 

125.02069 

0.32367 

2.64511 

2.75585 
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TABLE  VIII _ _ _ 

Linear  correlation  of  absorption  data  and  molecular  indices  from  Table  VII. 


‘'max  VS. 

FTm 

FTe 

FTC 

Mn 

Me 

0.829 

Vapor  phase 

0.829 

0.015 

0.793 

0.811 

F13 

8.789 

8.814 

0.001 

6.776 

7.678 

sc 

17.359 

9.247 

0.083 

0.228 

0.223 

-R 

0.825 

Heptane  solution 
0.825 

0.003 

0.784 

0.802 

F 

8.540 

8.540 

0.000 

6.374 

7.228 

s 

17.531 

9.347 

0.083 

0.233 

0.227 

a  Correlation  coefficient. 
bF  statistic. 
c  Standard  deviation. 


the  ground  state.  As  noted  above,  the  correlations 
are  influenced  by  nitrobenzene  itself.  When  this 
compound  is  omitted,  the  correlation  coefficients 
for  the  substituted  compounds  rise  to  -  0.962  and 
-  0.935  for  the  vapor  and  solution  phase  absorban¬ 
cies,  respectively,  with  analogous  F  statistics  of 
36.898  and  20.700. 

UV  ABSORPTION  WAVE  NUMBER 

ESTIMATION 

In  the  models  discussed  above  the  wave  num¬ 
ber  of  UV  absorption  was  regressed,  as  the  inde¬ 
pendent  variable,  against  the  respective  calculated 
structure  indices  as  the  dependent  variable.  This 
was  done  in  order  to  elucidate  the  index  most 
favorable  for  use  in  estimation  equations.  For  the 
latter  purpose  the  variable  dependency  must  be 
reversed,  as  one  would  wish  to  predict  an  absorp¬ 
tion  frequency  based  on  knowledge  of  structure 
and  a  corresponding  calculated  index.  In  the  fol¬ 
lowing  equations  this  has  been  done  for  the  re¬ 
ported  examples  and  classes  of  compounds.  In 
each  instance:  v  is  the  absorption  wave  number, 
the  molecular  indexes  are  as  defined  above,  n  is 
the  number  of  compounds  in  each  regression,  R  is 
the  correlation  coefficient,  s  is  the  standard  devia¬ 
tion,  and  F  is  the  F  statistic. 


Aromatics 

Estimations  in  this  series  for  compounds  not 
included  in  the  original  model  would  require  a 
judgment  as  to  which  reported  topological  group 
the  unknown  compound  should  belong.  A  prefer¬ 


able  way  would  be  to  compare  the  respective  FT,„ 
values  for  similarity,  as  has  been  reported  [5,  13] 
and  then  make  the  assignment;  this  would  be 
similarly  applicable  to  the  other  series  in  this  study. 

Group  1 

-  0.042  FTm  +  44.428, 

n  =  4,  R  =  -0.959, 
s  =  0.897,  F  =  22.728. 

Group  2 

v  —  -  0.099  FTm  +  66.426, 

n  =  5,  R  =  -0.986, 
s  =  1.209,  F  =  104.853. 

Group  3 

v  =  —0.107  FTm  +  74.827, 

n  =  9,  R  =  —0.960, 
s  =  1.662,  F  =  81.556. 

Conjugated  Dienes 


V  =  -  0.024  FT„,  +  46.919, 

n  =  11, 

R  =  -0.922, 

s  =  0.667, 

F  =  50.848, 

v  =  -2.737M,,  4-  28.902, 

n  =  11, 

R  =  -0.931, 

s  =  0.628, 

F  =  58.451. 
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Substituted  Benzenes 

Halobenzenes 

v  =  -  0.022  FTm  +  54.878, 

n  —  8,  R  =  -0.956, 
s  =  0.684,  F  =  63.888, 
or 

v=  -0.151  FT,  +  61.049, 

n  =  8,  R  —  -0.942, 
s  =  0.734,  F  =  47.239, 
or 

v  =  -  9.660 M,  +  70.433, 

n  -  8,  R  =  -0.959, 
s  =  0.663,  F  =  68.368. 

Other  substituted  benzenes  (Table  V,  com¬ 
pounds  3-17  inclusive  and  20,  23,  and  24) 

z>=  -0.023  FTm  +  55.235, 

n  —  18,  R  =  -0.897, 
s  =  0.734,  F  =  66.202. 

Alkyl-Substituted  Nitrobenzenes 
Vapor  Phase 

-  0.008  FTm  +  41.454, 

n  =  5,  jR  =  —0.962, 
s  =  0.059,  F  =  36.893, 
or 

z>  =  -0.015  FT,  +  41.306, 

n  —  5,  R  =  -0.950, 
s  =  0.067,  F  =  27.938. 

Heptane  Solution 

F=  -  0.008  FTm  +  39.232, 

n  =  5,  R  =  -0.935, 
s  =  0.073,  F  =  20.700, 
or 

T=  -0.014  FT,  +  39.089, 

n  =  5,  R  =  -0.920, 
s  =  0.080,  F  =  16.630. 


Conclusions 

This  study  has  demonstrated  the  excellent  cor¬ 
relation  of  the  principal  ultraviolet  absorption 


maxima  for  compounds  in  several  chemical  classes 
with  their  corresponding  unitary  molecular  struc¬ 
ture  indices.  In  multicyclic  ring  systems,  the  FTm 
served  also  as  a  discriminator  of  topological  struc¬ 
ture  groupings.  In  this  series  there  was  little  de¬ 
pendence  on  the  method  of  structure  optimization 
(molecular  mechanics  or  quantum  chemical)  in 
respect  to  the  magnitude  of  the  correlation  coeffi¬ 
cients.  Also  in  this  class,  the  FT„  the  FT,,  and  the 
Mn  provided  satisfactory  correlations  in  some  of 
the  FTm -delineated  groups. 

In  the  series  of  conjugated  dienes,  the  FTm,  FT,, 
and  Mn  gave  good  correlations  while  FT,  and  the 
electronic  moment  (Mn)  were  less  satisfactory. 

In  the  series  of  24  substituted  benzenes  the  FTm, 
as  a  structure  discriminator,  showed  that  the 
halobenzenes  may  be  treated  as  a  separate  class 
with  excellent  correlations  based  upon  the  FTm, 
FT,,  and  M,  indices.  The  remainder  of  the  com¬ 
pounds  in  this  series  represented  a  wide  variety  of 
substituent  classes  and  the  correlation  coefficients 
were  less  satisfactory.  However,  the  general  thrust 
of  the  results  again  suggested  that  correlations 
within  classical  chemical  groups,  with  a  sufficient 
number  of  examples  in  each,  should  be  examined. 

In  a  group  of  substituted  monoalkybenezes,  the 
FTm  and  FT,  indices  proved  to  be  the  most  useful 
for  spectra  correlation  in  both  the  vapor  phase  and 
in  heptane  solution,  with  the  FTm  being  only 
marginally  better  than  the  FT„  based  on  the  F 
statistic  of  the  models.  But  it  was  also  interesting 
in  this  case  that  unsubstituted  nitrobenzene  was 
an  outlier;  it  was  not  included  in  the  basis  for  the 
equations  recommended  for  absorption  estimation. 

For  the  purpose  of  estimating  the  absorption 
maximum  for  compounds  similar  to  those  consid¬ 
ered  herein,  statistically  robust  equations  were 
generated.  For  the  aromatic  hydrocarbons,  the  FTm 
was  employed  although  other  indices  would  also 
serve  as  well,  depending  on  which  topological 
group  was  being  considered.  For  the  conjugated 
dienes,  both  FTm  and  Mn  may  be  used.  In  the 
substituted  benzene  series,  FTm,  FT„  and  M,  pro¬ 
vide  equations  with  good  statistics  while,  for  more 
general  substituents,  the  FTm  will  serve  to  give 
preliminary  estimates.  In  the  case  of  the  alkylni- 
trobenzenes,  both  FTm  and  FT,  are  quite  satisfac¬ 
tory  structure  surrogates. 

Future  work  in  spectra-structure  correlation 
should  utilize  the  molecular  similarity  potential  of 
the  FTm  index  and  investigate  the  other  indices  for 
their  performance  in  that  capacity. 
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ABSTRACT:  Dexanabinol,  a  dihydroxylated  synthetic  cannabinoid,  is  a  member  of 
the  nonpsychotropic  (  + )  3S,  4S  enantiomeric  series.  Experimental  evidence  suggests  that 
dexanabinol  might  form  aggregates  (e.g.,  dimers)  in  which  the  two  OH  (a  phenol  and  an 
allylic  alcohol)  groups  are  involved  in  hydrogen  bonding.  The  extremely  low  solubility  of 
dexanabinol  in  water  implies  that  this  interaction  may  not  involve  solvent  molecules.  A 
theoretical  study  of  this  phenomenon  in  the  framework  of  the  PM3  molecular 
approximation  is  described.  Simple  molecular  models  (phenol  and  6-cyclohexene-l- 
methanol)  were  initially  examined  followed  by  extension  of  the  calculations  to 
dexanabinol.  The  results  indicate  that  dimers  of  dexanabinol  resulting  from  hydrogen 
bonding  are  more  stable  than  the  isolated  molecules  with  the  differences  attributed  to 
hydrogen  bonding  energies.  It  is  suggested  that  the  phenolic  hydroxy  group  of  one 
molecule  forms  a  hydrogen  bond  with  the  allylic  OH  group  of  the  second  molecule  and 
vice  versa,  resulting  in  dimers  which  contain  two  hydrogen  bonds.  The  hydrogen  bonds 
are  more  stable  (6.14  kcal/mol)  and  the  complex  formed  is  more  favored  energetically 
when  the  phenol  groups  act  as  hydrogen  bond  donors  and  the  allylic  OH  groups  as 
acceptors.  These  interactions  are  also  energetically  more  favored  than  those  between 
dexanabinol  and  water  (3.70  kcal/mol).  The  dexanabinol  dimer  manifested  a  lower 
dipole  moment  as  compared  to  the  monomer  (1.211  vs.  2.221  debye)  as  well  as  a  much 
larger  log  P  (11.16  vs.  5.90),  indicating  strong  hydrophobic  character.  The  optimized 
structure  shows  that  the  OH  groups  involved  in  hydrogen  bonds  are  oriented  to  the 
interior  of  the  dimers,  while  the  lipophilic  side  chains  are  oriented  toward  the  exterior. 
These  properties  of  the  dimer  may  explain  the  low  water  solubility  of  dexanabinol. 

©  1997  John  Wiley  &  Sons,  Inc.  Int  J  Quant  Chem  65:  1057-1064,  1997 


Correspondence  to:  E.  Pop. 


International  Journal  of  Quantum  Chemistry,  Vol.  65,  1057-1064  (1997) 
©  1997  John  Wiley  &  Sons,  Inc. 


CCC  0020-7608  /  97  /  061 057-08 


POP  AND  BREWSTER 


Introduction 

Dexanabinol  (HU-211),  the  (  +  )  3S,  4S-5'- 
(1' ,  V  -  dimethylheptyl)  -  A6-7-hydroxy-tetra- 
hydrocannabinol  is  a  nonpsychotropic,  synthetic 
cannabinoid  currently  in  clinical  development  as  a 
neuroprotective  agent  [1,2].  The  extremely  low  sol¬ 
ubility  of  this  compound  in  water  complicates  its 
formulation  as  an  intravenous  drug.  While  the 
bulky  lipophilic  V,  l'-dimethylheptyl  side  chain 
should  reduce  water  solubility,  the  two  hydroxyl 
groups  present  in  the  molecule  of  dexanabinol 
would  be  expected  to  induce  a  moderating  influ¬ 
ence.  The  compound  was,  however,  found  to  be 
practically  insoluble  in  water. 

A  possible  explanation  for  this  behavior  is  that 
the  OH  groups  of  dexanabinol  cannot  form  hydro¬ 
gen  bonds  with  the  solvent  since  they  are  already 
involved  in  more  favorable  intramolecular  or  in- 
termolecular  (dexanabinol-dexanabinol)  interac¬ 
tions.  Both  OH  groups  present  in  dexanabinol  (al- 
lylic  at  C-7  and  phenol  at  C-3')  can  participate  in 
hydrogen  bonding.  While  intramolecular  hydrogen 
bonding  are  less  probable  due  to  the  large  distance 
between  the  two  OH  groups,  intermolecular  bond¬ 
ing  is  predictable  [3-6]  and  can  occur  in  various 
ways  (i.e.,  phenol-phenol  and  allylic  OH-allylic 
OH,  phenol-allylic  OH,  etc.).  Experimental  evi¬ 
dence  for  this  assumption  is  supported  by  the 
infrared  (IR)  spectra  of  dexanabinol.  The  shift  of  IR 
frequencies  of  the  hydroxyl  groups  from  3590  to 
3650  cm-1,  typical  for  free  groups  to  lower  values 
(3226  and  3424  cm-1)  is  indicative  of  hydrogen 
bonding  [7,8].  The  relatively  high  melting  point  of 
dexanabinol  (140-143°C)  as  compared  to  related 
compounds  (e.g.,  the  6-methyl  analog,  dexanabinol 
pivalate,  etc.)  which  are  oils,  is  also  suggestive  of 
stabilization  due  to  hydrogen  bonding.  Further¬ 
more,  acylation  of  the  phenolic  position  of  dexan¬ 
abinol  to  form  the  acetate  or  other  esters,  paradoxi¬ 
cally  results  in  increased  water  solubility.  The  ob¬ 
servation  is  consistent  with  incipient  disruption  of 
hydrogen  bonds. 

A,  theoretical  study  of  this  phenomenon  has 
been  performed.  The  PM3  molecular  orbital  ap¬ 
proximation  [9,10]  was  used  for  this  purpose.  The 
paradigm  of  the  study  included  a  comparison  of 
the  thermodynamic  stability,  as  reflected  by  the 
calculated  heat  of  formation  (AH/),  of  dexanabinol 
monomer  to  the  stability  of  a  dimer  resulting  by 


double  hydrogen  bond  formation  of  two  molecules 
of  dexanabinol.  If  the  energy  of  dimers  were  found 
to  be  lower  than  the  sum  of  two  isolated 
monomers,  the  dimerization  should  be  energeti¬ 
cally  favored.  Moreover,  the  difference  in  energy 
should  be  a  measure  of  the  energy  of  the  hydrogen 
bonding.  Since  the  computations  are  rather  com¬ 
plex  with  over  300  orbitals  to  be  calculated,  simple 
models  (phenol  and  6-cyclohexene-l -methanol) 
were  used  for  initial  studies.  Vibrational  spectra 
were  only  calculated  for  these  models.  Other  phys¬ 
ical  properties  relevant  to  solubility  (dipoles,  log 
P )  are  also  evaluated. 


Methods 

Theoretical  studies  were  performed  using  PM3 
molecular  orbital  approximation  [9,10]  which  was 
included  in  the  HyperChem  (Hypercube,  Inc.,  Wa¬ 
terloo,  Ontario,  Canada)  version  5.0  software  [11] 
run  on  a  pentium  Digital  computer.  PM3  uses  a  set 
of  parameters  derived  from  a  larger  number  and 
variety  of  experimental  versus  calculated  molecu¬ 
lar  properties,  as  compared  to  other  semiempirical 
methods,  including  the  AMI  procedure  [12].  Typi¬ 
cally,  nonbonded  interactions  are  less  repulsive  in 
the  PM3  procedure  [11].  Molecular  models  were 
constructed  by  the  model  builder  of  HyperChem. 
Geometry  optimization  was  completed  by  using 
the  Polak-Ribiere  conjugate  gradient  algorithm 
method.  The  restricted  Hartree-Fock  (RHF) 
method  was  applied  to  the  calculation  of  wave 
functions.  Hydrogen  bonds  were  displayed  after 
molecule  pairs  were  arranged  so  that  the  required 
conditions  (hydrogen  donor-acceptor  distance  less 
than  3.2  A  and  the  angle  made  by  covalent  bonds 
to  the  donor  and  acceptor  atoms  less  than  120°) 
were  fulfilled.  Log  P  were  calculated  using  a 
nonlinear  regressional  model  in  which  all  the  de¬ 
scriptors  used  (molecular  surface,  volume,  weight, 
etc.)  are  determined  from  the  fully  optimized 
structures  [13],  included  in  the  QSAR  package  of 
the  ChemPlus™  (version  1.5)  extension  for  Hyper¬ 
Chem. 


Results  and  Discussion 

The  hydrogen  bonding  in  the  selected  model 
systems,  i.e.,  phenol  (1)  and  6-cyclohexene-l- 
methanol  (2)  (Fig.  1),  was  considered  in  the  initial 
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4 

FIGURE  1.  Structures  of  phenol  (1), 

1 -cyclohexene-1 -methanol  (2),  and  two  complexes 
(3  and  4)  resulting  by  hydrogen  bonding. 

study,  and  heats  of  formation  (A  Hf)  determined 
for  the  optimized  structures.  Molecules  1  and  2 
were  then  arranged  so  that  the  hydrogen  bonding 
conditions  were  fulfilled.  In  model  3,  the  allylic 
OH  served  as  the  hydrogen  bond  donor  and  the 
phenol  as  the  acceptor,  while  in  the  model  4  the 
phenol  is  the  hydrogen  bond  donor.  Hydrogen 


bonds  were  displayed  using  the  Hyperchem  "show 
hydrogen  bonds"  and  "recompute  hydrogen 
bond"  options.  Both  dimers  were  then  reoptimized 
using  the  PM3  Hamiltonian.  The  calculated  heats 
of  formation  of  the  complexes  were  compared  to 
those  of  the  monomers  (Table  I).  In  both  cases, 
A  Hf  of  the  complex  resulting  from  hydrogen 
bonding  was  smaller  than  the  sum  of  AHf's  of  1 
and  2.  It  is  reasonable  to  assume  that  these  differ¬ 
ences  (A  A  Hf),  calculated  according  to  Eq.  (1),  rep¬ 
resents  the  energies  of  the  hydrogen  bonding  (the 
same  results  can  be  obtained  by  using  the  total 
binding  energies  of  1,  2,  3  and  4,  presented  in 
Table  I,  instead  of  AHf's  for  these  calculations). 
The  results  (Table  I)  indicate  that  the  hydrogen 
bond  energy  was  only  1.48  kcal/mol  in  the  case  of 
3,  indicating  the  formation  of  a  weak  hydrogen 
bond,  but  6.63  kcal/mol  for  4  indicating  a  stronger 
hydrogen  bond: 

AA  Hf  =  A  H/(I)  +  A  H/(II)  -  A  H/(I+n),  (1) 

where  AH/(I)  is  the  heat  of  formation  of  the  phe¬ 
nol,  A  Hf(II)  the  heat  of  formation  of  the  1-cyclohe- 
xen-l-methanol,  and  AHf(I+ir),  the  heat  of  forma¬ 
tion  of  the  dimers  (3  or  4). 

Net  charges  and  atomic  orbital  populations  are 
presented  in  Table  II.  A  higher  degree  of  polariza¬ 
tion  of  the  positive  hydrogen  and  negatively 
charged  oxygen  atoms  can  be  noticed  in  the  case  of 
hydrogen-bonded  molecules  3  and  4.  In  the  case  of 
3,  a  strong  participation  of  the  Pz  orbital  of  the 
allylic  alcohol  oxygen  (hydrogen  donor)  was  noted, 
while  in  the  case  of  4,  of  the  Pz  orbital  of  the 
phenol  oxygen  (hydrogen  donor)  can  be  seen. 

A  vibrational  analysis  was  performed  and  the 
vibrational  (IR)  spectra  were  calculated  for  the 
models.  No  negative  frequencies  appeared  in  the 


TABLE  I 


Calculated  binding  energies  (E),  heats  of  formations  (A  Hf), 
(A  A  Hf)  (kcal/mol) 

and  estimated  hydrogen  bond  energies 

Compound 

E  (kcal/mol) 

A  Hf  (kcal/mol) 

(A  A  Hf)  (kcal/mol) 

1 

-1419.36 

-21.85 

_ 

2 

-1931.91 

-50.90 

— 

3 

-3352.76 

-74.23 

1.48 

4 

-3357.90 

-79.38 

6.63 

5 

-6584.86 

-154.05 

— 

6 

-13171.30 

-309.69 

0.80 

7 

-13180.23 

-320.69 

6.14 

8 

-7026.68 

-268.36 

3.70 
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TABLE  II 


Net  charges  and  atomic  orbital  electron  populations  for  atoms  involved  in  hydrogen  bonding 


Compound 

Net  charge  and  atomic  orbital  electron  population 

Phenol 

Allylic  alcohol 

0 

H 

0 

H 

Phenol  (1) 

-0.228 

0.196 

— 

— 

Px:  1 .275 

s:  0.803 

Py:  1 .243 

Pz:  1.916 

Cyclohexene- 

— 

— 

-0.306 

0.181 

1 -methanol  (2) 

Px:  1 .574 

s:  0.819 

Py:  1.227 

Pz:  1 .645 

Complex  3 

-0.248 

— 

-0.335 

0.209 

Px:  1.286 

Px:  1.300 

s:  0.791 

Py:  1 .422 

Pz:  1.302 

Pz:  1 .751 

Pz:  1 .921 

Complex  4 

-0.271 

0.232 

-0.335 

— 

Px:  1 .660 

s:  0.678 

Px:  1 .392 

Py:  1.418 

Py:  1 .830 

Pz:  1 .891 

Pz:  1 .297 

Dexanabinol  (5) 

-0.231 

0.194 

-0.308 

0.184 

Px:  1.361 

s:  0.805 

Px:  1 .349 

s:  0.816 

Py:  1.333 

Py:  1 .377 

Pz:  1 .743 

Pz:  1 .775 

Dimer  6 

a:  -0.252 

a:  0.208 

a:  -0.330 

a:  0.200 

Px:  1 .545 

s:  0.792 

Px:  1 .250 

s:  0.800 

Py:  1.692 

Py:  1 .754 

Pz:  1.222 

Pz:  1.514 

b\-  0.250 

b\  0.203 

b:  -0.335 

b:  0.202 

Px:  1.379 

s:  0.797 

Px:  1.464 

s:  0.798 

Py:  1.318 

Py:  1 .406 

Pz:  1.759 

Pz:  1.653 

Dimer  7 

a:  -0.267 

a:  0.205 

a:  -0.333 

a:  0.197 

Px:  1.285 

s:  0.794 

Px:  1 .741 

s:  0.803 

Py:  1 .872 

Py:  1.251 

Pz:  1.315 

Pz:  1 .535 

b\  -0.268 

b:  0.211 

b:  -0.333 

b:  0.194 

Px:  1 .287 

s:  0.788 

Px:  1 .583 

s:  0.805 

Py:  1.837 

Py:  1 .345 

Pz:  1.350 

Pz:  1 .599 

Dexanabinol- 

-0.264 

0.215 

-0.350 

0.197 

water  complex  (8) 

Px:  1 .378 

s:  0.784 

Px:  1 .209 

s:  0.803 

Py:  1 .336 

Py:  1.387 

Pz:  1 .754 

Pz:  1 .944 

Note:  a  and  b  represent  the  two  hydrogen  bonds  and  atoms  involved,  respectively. 
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calculated  IR  spectra,  indicating  that  valid  mini¬ 
mum  energy  structures  were  obtained  [11].  Recent 
results  [14]  indicated  that  out  of  PM3,  AMI  and 
MNDO  methods  used  for  calculating  IR  frequency, 
PM3  showed  the  closest  correspondence  (which  is 
generally  about  10%  too  high  in  value  of  stretches) 
to  experimental  values.  Indeed,  calculated  values 
of  the  O-H  stretching  vibrations  were  3891  cm-1 
for  phenol  (1)  and  3812  cm”1  for  the  hydrogen 
bond  containing  complex  4,  indicating  a  ba- 
tochromic  shift  as  that  observed  experimentally  for 
hydroxyl  groups  involved  in  hydrogen  bonding. 


The  study  of  dimers  of  dexanabinol  formed  by 
hydrogen  bonding  was  performed  using  the  same 
rationale:  the  thermodynamic  stability  of  the  dex¬ 
anabinol  monomer  (5)  (Fig.  2),  as  reflected  by 
calculated  PM3  heat  of  formation  (A  Hf)  was  first 
determined.  The  optimized  geometry  of  5  was 
then  examined  to  verify  the  possibility  of  forma¬ 
tion  of  intramolecular  hydrogen  bonds.  The  dis¬ 
tance  between  the  two  oxygen  atoms  of  the  OH 
functionalities  was  too  large  (5.939  A)  to  form 
hydrogen  bondings.  Two  dexanabinol  molecules 
were  then  arranged  so  that  intermolecular  hydro- 


FIGURE  2.  Structure  of  dexanabinol  (5),  two  possible  dimers  (6  and  7),  resulting  by  hydrogen  bonding  and  the 
complex  resulting  by  hydrogen  bonding  of  5  with  water  (8).  Structures  6-8  are  abbreviated. 
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gen  bonding  between  OH  groups  could  occur. 
Two  models  in  which  all  four  OH  groups  of  the 
two  dexanabinol  molecules  were  involved  in  hy¬ 
drogen  bonding  were  built:  one  (models  6)  in 
which  the  allylic  OH  groups  served  as  the  hydro¬ 
gen  bond  donors  and  the  phenols  as  the  acceptors, 
while  in  the  other  one  (7)  the  phenols  are  the 
hydrogen  donors  and  the  allylic  OH  groups  the 
acceptors.  Due  to  sterical  hindrance,  models  in 
which  the  phenol  interacted  with  the  juxtaposed 
phenol  and  allylic  hydroxyl  with  the  complemen¬ 
tary  allylic  hydroxyl  group  could  not  be  built. 
Also,  models  in  which  only  one  hydrogen  bond 
was  formed  were  not  considered  in  this  study.  The 
geometries  of  the  resulting  dimers  were  then  reop¬ 
timized  using  the  PM3  procedure.  Calculated  heats 
of  formations  of  dimers  were  compared  with  those 
of  the  monomers.  The  dimers  had  smaller  A  Hf 
( -  309.69  and  -  320.38  kcal/mol)  than  the  sum  of 
two  monomers  (-308.1  kcal/mol)  (Table  I),  indi¬ 
cating  that  dimers  were  energetically  favored. 
Equation  (2)  was  used  to  calculate  the  hydrogen 
bond  energies: 

AA Hf  =  [2AH/(III)  -  AH/(IV)]/2  (2) 

where  A  Hf(III)  is  the  heat  of  formation  of  dexanabi¬ 
nol  and  A  H/(IV)  is  the  heat  of  formation  of  the 
dimer  6  or  7. 

As  in  the  case  of  the  simpler  systems  discussed, 
the  hydrogen  bonds  for  the  model  in  which  the 
allylic  OH  was  the  hydrogen  bond  donor  (dimer  6) 
were  less  favored  (0.795  kcal/mol)  than  those  in 
which  the  phenol  was  the  hydrogen  donor  (dimer 
7)  (6.14  kcal/mol)  (Table  I).  The  polarization  of  the 
oxygen  and  hydrogen  atoms  involved  in  the  bond¬ 
ing  discussed  above  is  also  apparent  in  this  case 
(Table  II);  i.e.,  the  oxygen  atoms  have  an  increased 
negative  charge,  and  the  hydrogen  atoms  have  an 
increased  positive  charge  as  compared  to  the 
monomer.  In  6,  orbitals  Py  or  the  oxygen  atoms 
have  the  larger  contribution  for  one  of  the  hydro¬ 
gen  bonds  and  the  Px  and  Pz  for  the  other,  while 
in  7,  the  Py  orbitals  were  the  major  contributors  to 
the  highest  occupied  molecular  orbitals  (HOMO) 
for  both  oxygen  atoms. 

By  following  the  same  rationale,  the  hydrogen 
bonding  of  the  dexanabinol  monomer  (5)  with  two 
molecules  of  water  has  been  evaluated.  The  calcu¬ 
lated  (PM3)  A  Hf  for  water  is  -53.46  kcal/mol.  To 
both  OH  groups  of  optimized  5  a  molecule  of 
water  was  then  added  so  that  the  hydrogen  bond 
requirements  were  fulfilled.  After  the  hydrogen 


bonds  were  built,  the  geometry  of  the  trimolecular 
system,  8,  was  reoptimized  using  the  PM3  approx¬ 
imation.  The  calculated  heat  of  formation  of  the 
supramolecular  system  was  -268.36  kcal/mol. 
The  energy  of  the  hydrogen  bonds  (A  A  Hf)  was 
calculated  by  using  Eq.  (3): 

A  A  H/=  [(A  Hfm  +  2AH/h20)  -  AH/(V)]/2,  (3) 

where  A  HfHi0  is  the  heat  of  formation  of  water 
and  A  Hf(V)  the  heat  of  formation  of  8.  The  energy 
of  the  hydrogen  bondings  between  dexanabinol 
and  water  was  found  to  be  3.70  kcal/mol. 

Due  to  the  large  volume  of  computations  re¬ 
quired,  no  vibrational  spectra  were  calculated  for 
these  molecules. 

These  data  suggest  that  hydrogen  bonding  sta¬ 
bilizes  dexanabinol  but  that  dimer  formation  was 
more  energetic  than  interaction  with  water 
molecules.  While  the  data  presented  above  refer  to 
the  gas  phase,  thermodynamic  stabilities  including 
solvent  effects  are  currently  being  examined.  Obvi¬ 
ously,  the  presence  of  water  may  significantly  alter 
energetics  of  the  system.  It  is  possible  that  in  spite 
of  stronger  hydrogen  bonding  associated  with  the 
dimer,  dilution  with  concomitant  increase  in  water 
concentration  and  decrease  of  dexanabinol  concen¬ 
tration  may  weaken  the  dimeric  interaction  or  even 
lead  to  dissociation.  On  the  other  hand,  the  ob¬ 
served  poor  solubility  of  dexanabinol  in  water, 
even  at  very  low  concentration  is  consistent  with 
the  presence  of  stable  dimers  that  do  not  readily 
dissociate. 

In  this  context  it  is  of  interest  to  examine  other 
molecular  properties  of  dexanabinol  related  to  sol¬ 
ubility.  Dipoles  are  importance  for  solvation,  and 
polar  compounds  generally  manifest  better  solubil¬ 
ity  in  polar  solvents,  such  as  water.  The  calculated 
dipole  moment  of  the  dexanabinol  monomer  (5)  is 
2.221  debye,  while  dimer  7  is  quite  symmetrical, 
having  an  even  lower  dipole  moment  (1.211  de- 
bye)  (Table  III).  The  dipole  moment  of  the  complex 
of  5  with  two  water  molecules  is  higher  (3.499 
debye).  Dipole  moments  indicate  that  the  solubil¬ 
ity  of  dexanabinol  in  water,  especially  in  the  form 
of  the  dimer,  should  be  low.  Table  III  includes 
data  about  the  geometries  of  the  hydrogen  bonds 
such  as  the  distances  between  the  heavy  atoms 
(oxygen)  participating  in  the  hydrogen  bonds  and 
the  angles  formed  by  O— H— O  atoms.  These  pa¬ 
rameters  were  not  modified  significantly  during 
computations  since  the  conditions  were  imposed  a 
priori.  The  hydrogen  is  intermediate  between  two 
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TABLE  III 


Calculated  geometries  of  hydrogen  bonds,  dipole  moments  and  log  P 


Compound 

0  — 

0  —  0 
Distance  (A) 

H  —  0  Hydrogen  bond 

Angle  (0  —  H  —  0) 
(degrees) 

Dipole 

(debye) 

log  P 

3 

2.783 

170.66 

2.671 

1.37 

4 

2.767 

170.98 

2.774 

4.27 

6 

2.776 

159.44 

2.325 

10.83 

2.813 

159.69 

7 

2.769 

173.18 

1.211 

11.52 

2.770 

174.87 

8 

3.159 

165.11 

3.499 

1.34 

2.763 

171.41 

participating  oxygen  atoms,  but  not  equidistant. 
By  examining  the  shape  of  the  dimer  (Figs.  3  and 
4),  it  can  be  seen  that  its  structure  is  symmetrical, 
having  the  polar  OH  groups  engaged  in  hydrogen 
bonding  in  the  interior  and  the  highly  lipophilic 
dimethylheptyl  side  chains  projected  toward  the 
exterior.  These  types  of  compounds  are  not  ex¬ 
pected  to  be  soluble  in  water. 

Log  P  (Table  III),  which  are  reliable  indices  for 
the  lipophilicity  of  various  compounds,  indicate 
that  while  dexanabinol  itself  is  lipophilic  (log  P: 
5.90),  the  lipophilicity  of  dimer  7  is  extremely  high 
(log  P:  11.52).  Furthermore,  we  have  begun  to 
look  at  explicit  interactions  between  the  hydroxyl 


functions  and  water.  To  this  end,  the  dihydrate  8 
was  examined  as  described.  Addition  of  water 
molecules  to  the  structure  affects  log  P  as  indi¬ 
cated  by  a  calculated  value  of  1.34  for  the  trimolec- 
ular  system. 

In  summary,  a  dimer  of  dexanabinol  resulting 
from  hydrogen  bonding  is  thermodynamically 
more  favored  than  the  individual  molecules  or 
hydrates  of  dexanabinol.  The  dimer  is  symmetrical 
as  reflected  by  a  low  dipole  moment  and  a  high 
lipophilicity.  Dimers  appear  to  be  stable  and  viable 
in  the  solid  state  (as  suggested  by  experimental 
evidence  such  as  high  melting  points  and  ba- 
tochromic  shifts  of  the  OH  in  IR  frequencies)  and 


FIGURE  3.  “Balls  and  cylinders”  rendering  of  the  optimized  dimer  7. 
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FIGURE  4.  “Balls”  representation  of  dimer  7. 


possible  in  gas  phase  as  well.  The  poor  solubility 
of  dexanabinol  in  water  indicate  that  dimers  may 
not  dissociate  even  when  present  at  very  low  con¬ 
centration.  The  theoretical  findings  presented 
above  are  supported  by  experimental  evidence  in¬ 
dicating  poor  aqueous  solubility  of  this  important 
drug  candidate. 
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ABSTRACT:  We  consider  the  problem  of  quantitative  characterization  of  the  molecular 
surface.  We  start  with  a  set  of  matrices,  the  elements  of  which  give  interatomic  separation 
and  higher  powers  of  the  separations.  Averaged  row  sums  of  individual  matrices 
suitably  normalized  give  molecular  profiles.  The  problem  that  we  consider  is  how  to 
generalize  this  approach  to  2-dimensional  and  3-dimensional  objects.  By  using  a  large 
number  of  random  points  distributed  over  the  molecular  surface  or  molecular  volume, 
respectively,  we  arrive  at  matrices  from  which  one  can  extract  invariants  that  offer  a 
good  characterization  of  the  molecular  surface  and  the  molecular  volume.  It  is  suggested 
that  the  ratio  V /S,  where  V  and  S  are  components  of  the  volume  and  surface  profile  for 
a  molecule,  respectively,  represents  a  novel  shape  index.  ©  1997  John  Wiley  &  Sons,  Inc.  Int 
J  Quant  Chem  65:  1065-1076,  1997 


Introduction 

■  ■  uantitative  characterization  of  molecules  re- 
mains  one  of  the  central  subjects  of  mathe¬ 
matical  chemistry,  molecular  modeling,  the  quan¬ 
titative  structure-activity  relationship  (QSAR),  the 
computer  manipulation  of  chemical  structure,  and 
chemical  documentation.  Most  of  the  approaches 
of  the  past  focused  on  atom-atom  connectivity 
(when  viewing  a  molecule  as  a  molecular  graph) 

Dedicated  to  Mrs.  Per-Olov  Lowdin,  gracious  companion  of 
the  Sanibel  Symposia. 
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[1]  or  molecular  geometry  (when  considering 
molecules  as  3-D  objects)  [2].  Such  viewing  of 
molecules  corresponds  to  "ball-and-stick"  model 
kits,  which  clearly  portray  bonding  and  spatial 
distributions  of  atoms.  "Space-filling"  model  kits, 
on  the  other  hand,  better  illustrate  the  molecular 
surface,  crowded  atoms,  and  protein  cavities. 
Characterization  of  the  molecular  surface  has  re¬ 
ceived  limited  attention  in  the  literature,  in  part, 
no  doubt,  due  to  the  difficulties  involved.  As  a 
result,  we  come  across  qualitative,  rather  than 
quantitative,  attributes  of  the  molecular  surface. 
Mezey  and  collaborators  [3]  and  Arteca  [4],  e.g., 
partitioned  the  molecular  surface  into  regions  of 
different  curvature,  which  lead  to  regions  of 
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"catchment,"  bifurcations,  saddle  points,  etc.  The 
essential  step  is  to  associate  a  molecular  con¬ 
figuration  with  a  molecular  surface  whose  shape 
features  will  characterize  the  configuration.  In 
contrast,  Bader  and  co-workers  [5]  considered  the 
topography  of  contours  of  constant  electron  den¬ 
sity  and  partitioned  the  molecular  volume  into 
regions  associated  with  molecular  fragments  by 
considering  the  derivatives  of  the  electron  density 
function.  While  these  are  useful  for  the  discussion 
of  local  molecular  properties  of  the  molecular  sur¬ 
face  or  molecular  volume,  they  do  not  give  numer¬ 
ical  characterization  that  can  be  associated  with  an 
individual  molecular  surface. 

Part  of  the  difficulty,  as  no  doubt  many  have 
recognized,  is  that  the  characterization  of  the 
molecular  surface  is  intimately  connected  with  the 
characterization  of  the  molecular  shape.  Molecular 
shape,  in  contrast  to  the  molecular  volume  and 
molecular  surface,  cannot  be  represented  by  a  sin¬ 
gle  number.  Shape  is  an  elusive  concept  that  has 
not  yet  been  well  defined  and  is  the  least  well 
characterized  one,  despite  the  fact  that  we  all  have 
clear  notions  what  is  the  shape  of  an  object.  Molec¬ 
ular  shape  is  a  critical  factor  for  many  molecular 
properties.  For  example,  it  is  the  dominant  factor 
in  olfactory  sensations,  as  was  recognized  already 
by  Ruzicka  in  the  1920s  [6].  The  similarity  in 
shape,  not  the  similarity  in  chemical  structure, 
often  determines  the  odorous  quality  of  a  molecule, 
as  is  reflected  in  the  similarity  of  shapes  of  cive- 
tone,  a  macrocyclic  compound,  and  two  polycyclic 
steroids,  which,  as  Prelog  and  Ruzicka  have  shown, 
all  have  a  similar  characteristic  musk  odor  [7]. 
Molecular  shape  is  important  in  circular  dichroism 
and  chirality.  Enantiomers,  if  viewed  in  isolation, 
have  identical  all-physical  properties,  but  when  in 
an  asymmetrical  environment  (either  in  the  pres¬ 
ence  of  polarized  light  of  different  orientation  or  in 
the  proximity  of  an  asymmetrical  macromolecule), 
they  show  different  behavior,  because  of  the  dis¬ 
tinction  between  their  shape  and  their  mirror 
image  shape,  which  becomes  critical  in  a  chiral 
environment. 


Molecular  Shape 

Molecular  shape  is  one  of  many  common  chem¬ 
ical  concepts  that  remains  elusive  to  quantification. 
Other  concepts  of  great  significance  in  chemistry 
that  also  are  very  common  and  whose  characteri¬ 


zation  remains  elusive  or  ambiguous  include  size 
[8],  complexity,*  branching  [9],  similarity  [10],  and 
structure  itself  [11].  In  each  case,  we  lack  a  clear 
definition  or  a  generally  accepted  definition  of  the 
concept.  Some  may  argue  that  these  concepts  ought 
to  remain  undefined,  i.e.,  qualitative.  We  feel  that 
Lord  Kelvin  was  right  when  he  said  that  [12] 

When  you  can  measure  what  you  are  speaking 
about,  and  express  it  in  numbers,  you  know 
something  about  it;  but  when  you  cannot 
measure  it,  when  you  cannot  express  it  in 
numbers,  your  knowledge  is  of  a  meager  and 
unsatisfactory  kind .... 

Some  progress  has  been  made  toward  quantifica¬ 
tion  of  the  above-mentioned  concepts*  [9-11],  in¬ 
cluding  characterization  of  the  molecular  shape.  A 
crude  characterization  of  the  molecular  shape  has 
been  suggested  for  the  interpretation  of  chromato¬ 
graphic  retention  indices  of  benzenoid  polycyclic 
aromatic  compounds  [13]:  the  ratio  L/B,  where  L 
is  the  length  and  B  is  the  width  of  the  smallest 
rectangle  in  which  the  structural  formula  of  a 
benzenoid  molecule  can  be  inscribed.  There  are 
other  geometrical  indices  of  shapes  that  have  been 
outlined  in  the  literature  [14,15]. 

Kier  [16]  considered  molecular  graphs,  which 
although  devoid  of  strict  geometrical  information 
about  a  molecule,  still  allow  one  to  discuss  molec¬ 
ular  shapes.  Using  as  the  extreme  graph  shapes  the 
path  graph,  which  corresponds  to  a  linear  chain 
structure,  and  the  star  graph  (such  as  the  molecu¬ 
lar  graph  of  neopentane),  Kier  arrived  at  the  set  of 
indices  (called  k  shape  indices)  that  reflect  to 
some  extent  molecular  shape  rather  than  molecu¬ 
lar  size  as  most  topological  indices  do.  An  alterna¬ 
tive  approach  would  be  to  subtract  from  the 
topological  index  the  index  of  size  or  consider  a 
difference  of  two  topological  indices^  Despite  the 
large  number  of  available  topological  indices,  some 
molecular  properties  are  difficult  to  represent  by  a 
single  dominant  descriptor.  In  the  case  of  alkanes, 
this  is  particularly  true  of  critical  temperature, 
critical  pressure,  and  critical  volume.  It  is  of  some 
interest  to  observe  that  the  best  single  variable 
simple  regression  for  critical  temperature  is  ob- 

*  Some  may  think  that  size  is  a  well  understood  concept  but 
how  is  one  to  decide  on  the  relative  size  of  two  molecules,  one 
which  has  more  atoms  of  smaller  size  and  the  other  fewer 
atoms  of  larger  size.  Is  molecular  volume  rather  than  the 
number  of  atoms  to  be  a  measure  of  size? 

f  This  avenue  has  not  yet  been  much  investigated. 
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tained  using  the  difference  between  the  connectiv¬ 
ity  indices  (the  coefficient  of  regression 

r  =  0.889,  the  standard  error  s  =  4.59°C)  [17].  Simi¬ 
larly,  the  descriptor  gives  the  best  single 

variable  regression  among  40  tested  descriptors  for 
surface  tension  in  octanes  [17].  Both  cases  suggest 
that  the  property  considered,  even  for  molecules  of 
the  same  size  (isomers  of  octane),  depends  on 
molecular  shape,  the  characterization  of  which  re¬ 
mains  elusive.  Obviously,  we  need  more  precise 
characterization  of  the  molecular  shape  and  molec¬ 
ular  surface  than  were  hitherto  available. 


Shape  Profiles 

Good  characterization  of  a  structure  need  not  be 
unique  but  ought  to  be  general  enough  so  that  it 
applies  to  structures  not  previously  considered 
and  even  to  structures  not  previously  anticipated. 
This  is  different  from  the  requirement  for  a  good 
representation  of  a  structure  that  has  to  be  unique, 
so  that  the  structure  can  be  reconstructed  from  its 
code.  Codes,  however,  imply  rules  for  the  number¬ 
ing  of  atoms,  ordering  of  attributes,  etc.  In  con¬ 
trast,  the  characterization  of  a  molecule  is  based  on 
structural  invariants  (i.e.,  mathematical  properties 
of  the  structure)  and,  hence,  is  independent  of 
representation,  the  numbering  of  atoms,  or  a  se¬ 
lected  pictographic  form  for  the  molecular  skele¬ 
ton.  The  problem  is  how  to  arrive  at  a  general 
characterization  of  the  molecular  shape  and  molec¬ 
ular  surface.  This  problem  has  been  recently  ad¬ 
dressed  and  successfully  resolved  [18].  This  does 
not  mean  that  there  is  no  room  for  alternative 
characterizations;  it  is  very  likely  that  they  will 
emerge.  However,  the  present  approach  satisfies 
the  basic  requirements  of  generality  and  is  not 
computationally  too  involved.  It  has  already  been 
applied  to  a  number  of  problems  requiring  charac¬ 
terization  of  molecules  of  different  shape  [19-22]. 

The  approach  is  based  on  molecular  geometry. 
We  start  with  the  distance  matrix  1D  in  which  the 
matrix  elements  •  are  given  by  the  length  of  the 
distance  between  vertices  i  and  ;.  In  the  case  of 
hydrocarbons,  the  distance  can  be  measured  in 
units  of  CC  bond  length  or,  in  other  words,  the 
element  is  given  as  the  ratio  of  the  interatomic 
distance  and  CC  distance  between  vertices  i  and  /. 
The  row  sum  R{  of  this  matrix,  when  divided  by 
the  number  of  atoms  in  a  molecule,  gives  the 
average  distance  from  atom  i  to  the  rest  of  the 


atoms  and  represents  a  characterization  of  atom  i. 
The  average  row  sum  R  of  this  matrix,  when 
divided  by  the  number  of  atoms  in  a  molecule, 
gives  the  average  distance  in  the  molecule,  is  a 
structural  invariant,  and  represents  a  characteriza¬ 
tion  of  a  molecule. 

A  single  invariant  is  often  not  enough  for  the 
characterization  of  molecules.  To  arrive  at  addi¬ 
tional  descriptors,  we  consider  next  the  distance 
matrix  2D  in  which  the  matrix  elements  are  given 
as  (dij)2,  where  dl-]  is  the  already  defined  distance 
between  vertices  i  and  ;.  The  average  row  sum  2R 
of  this  matrix  divided  by  the  number  of  atoms  in  a 
molecule  gives  an  additional  structural  invariant 
of  the  molecule  considered.  One  continues  in  this 
manner  and  generates  the  sequence 

'rSrSrSr^rSr/r,...  . 

Because  the  typical  entry  df-  is  larger  than  1, 
raising  matrix  elements  to  a  higher  power 
increases  the  row  sums  that  show  typical  diver¬ 
gence.  To  curb  the  divergence,  we  normalize  the 
sequence  by  factorials,  arriving  at 

1R,  2R/2\,  3R/3l,  *R /4!,  5R/ 5 !,  bR/ 6\,  7R/7\, .... 

When  the  distance  matrix  involves  all  atoms  in  a 
molecule,  we  refer  to  the  above  sequences  as  the 
molecular  profile.  If  the  summation  involves  only 
atoms  at  the  molecular  boundary,  we  refer  to  them 
as  shape  profiles. 

In  Figure  1,  we  illustrate  the  profiles  for  two 
smaller  benzenoids,  chrysene  and  benzphenan- 
threne,  reproduced  from  [18],  which  have  the  same 
number  of  Kekule  valence  structures  and  the  same 
count  of  conjugated  circuits  [23],  which  reflects 
considerable  similarity  in  their  rr-e lectron  charac¬ 
teristics.  However,  they  have  different  shape,  and 
as  we  see  from  Figure  1,  this  is  well  reflected  in 
their  profiles.  Clearly,  chrysene,  being  elongated, 
has  greater  average  interatomic  separations.  In  the 
case  considered,  both  the  molecular  (volume)  pro¬ 
file  and  the  shape  (boundary)  profile  are  the  same, 
since  all  carbon  atoms  are  at  the  molecular  periph¬ 
ery.  In  large  systems,  in  particular,  in  the  case  of 
peri-condensed  benzenoids,  the  two  will  some¬ 
what  differ  [18]. 

Extension  of  molecular  profiles  to  true  volume 
profiles,  i.e.,  to  the  characterization  of  molecular 
volume,  and,  similarly,  extension  of  the  shape  pro¬ 
files  to  characterize  the  molecular  surface  are  not 
straightforward.  Such  extensions  require  a  concep¬ 
tual  "jump"  from  the  characterization  of  discrete 
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FIGURE  1.  Molecular  profiles  for  chrysene  and  benzphenanthrene. 


objects,  i.e.,  objects  defined  by  a  finite  number  of 
points  in  the  space,  to  the  characterization  of  a 
continuum.  The  bridge  between  the  two  was  nar¬ 
rowed  down  with  the  introduction  of  bond  pro¬ 
files  [20],  a  generalization  of  molecular  profiles  in 
which  molecular  connectivity  is  taken  into  ac¬ 
count.  This  has  been  achieved  by  spacing  "ghost" 
points  along  individual  bonds  in  a  molecule  and 
averaging  by  taking  into  account  also  the  presence 
of  additional  points  in  the  structure.  As  the  num¬ 
ber  of  points  in  each  bond  is  increased,  one  can 
bridge  the  gap  between  the  discrete  representation 
and  continuum.  It  has  been  shown  in  the  case  of  a 
cuboctahedron  and  a  twist-cuboctahedron  [21]  that 
as  n,  the  number  of  "ghost"  points  along  each 
bond,  increases  the  average  bond  profiles  converge 
to  a  limit  that  represents  a  1-D  continuum.  The 
problem  that  we  will  consider  here  is  how  to 
extend  the  same  procedure  to  a  2-D  continuum 
(and  later  the  molecular  surface)  and  a  3-D 
continuum  (molecular  volume). 


2-Dimensional  Continuum 

We  will  outline  this  generalization  by  consider¬ 
ing  a  particular  example:  the  van  der  Waals  con¬ 
tour  of  the  water  molecule  (Fig.  2).  Clearly,  the 
model  of  H20  as  a  planar  system  is  not  adequate, 
but  one  can  view  this  model  as  one  projection  of  a 
3-D  water  molecule.  By  using  several  such  projec¬ 
tions  (along  the  lines  of  the  work  outlined  by  Jurs 
and  collaborators  [15]),  one  can  arrive  at  a  useful 
characterization  of  3-D  molecules.  In  the  case  of 
planar  molecules,  like  most  benzenoids,  the  ap¬ 
proach  presented  here  may  suffice  as  an  adequate 
characterization  of  molecules  of  difference  shape. 

What  we  need  to  do  is  to  implant  a  large  num¬ 
ber  of  "ghost"  points  inside  the  molecular  2-D 


FIGURE  2.  Van  der  Waals  contour  of  H20  molecule. 


boundary  and  continue  with  calculating  the  aver¬ 
age  row  sums  as  we  did  in  the  case  of  bond 
profiles  [20,21].  However,  the  points  have  to  be 
uniformly  distributed,  which  presents  no  problem 
in  the  case  of  bonds  (particularly  bonds  of  equal 
length).  This  would  present  insurmountable  diffi¬ 
culties  for  irregular  shapes,  and  even  in  the  sim¬ 
plest  cases  (like  a  spherical  shape,  i.e.,  a  circle  in  a 
2-D  projection),  finding  uniform  distributions  is 
impractical.  The  alternative  is  to  place  points  at 
random,  as  in  a  typical  Monte  Carlo  simulation. 
Not  only  does  this  make  computations  relatively 
simple,  but  a  random  distribution  is  also  a  better 
guarantee  that  we  need  not  be  concerned  about 
systematic  errors  that  may  result  from  a  regular 
placement  of  points. 

In  Table  I,  we  list  the  computer  profiles  for  a 
planar  model  of  H20.  They  are  based  on  van  der 
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TABLE  I _ 

Profile  for  2-D  model  of  H20. 


1 

2 

3 

4 

5 

500 

0.398640 

+ 

00 

0.982942 

— 

01 

0.183261 

— 

01 

0.278948 

_ 

02 

0.361304 

— 

03 

1000 

0.403212 

+ 

00 

0.100323 

- 

00 

0.188366 

- 

01 

0.288239 

- 

02 

0.374793 

- 

03 

1500 

0.398258 

+ 

00 

0.978132 

- 

01 

0.181448 

- 

01 

0.274585 

- 

02 

0.353371 

- 

03 

2000 

0.396962 

+ 

00 

0.970965 

- 

01 

0.179322 

- 

01 

0.270121 

- 

02 

0.345998 

- 

03 

2500 

0.397872 

+ 

00 

0.975155 

- 

01 

0.180418 

- 

01 

0.272201 

- 

02 

0.349153 

- 

03 

3000 

0.397603 

+ 

00 

0.973593 

- 

01 

0.179981 

- 

01 

0.271362 

- 

02 

0.347886 

- 

03 

3500 

0.396244 

+ 

00 

0.966925 

- 

01 

0.1782-8 

- 

01 

0.267995 

- 

02 

0.342805 

- 

03 

4000 

0.395699 

+ 

00 

0.964036 

- 

01 

0.177361 

- 

01 

0.266226 

- 

02 

0.339903 

- 

03 

4500 

0.396492 

+ 

00 

0.967967 

- 

01 

0.178464 

- 

01 

0.268445 

- 

02 

0.343414 

- 

03 

5000 

0.396529 

+ 

00 

0.968175 

- 

01 

0.178562 

- 

01 

0.268751 

- 

02 

0.344094 

- 

03 

5500 

0.397276 

+ 

00 

0.971776 

- 

01 

0.179544 

- 

01 

0.270688 

- 

02 

0.347130 

- 

03 

6000 

0.397858 

+ 

00 

0.974466 

- 

01 

0.180241 

- 

01 

0.271997 

- 

02 

0.349094 

- 

03 

6500 

0.39761 1 

+ 

00 

0.973311 

- 

01 

0.179955 

- 

01 

0.271493 

- 

02 

0.348390 

- 

03 

7000 

0.397567 

4* 

00 

0.973060 

- 

01 

0.179900 

- 

01 

0.271436 

- 

02 

0.348393 

- 

03 

7500 

0.397830 

+ 

00 

0.974200 

- 

01 

0.180180 

- 

01 

0.271941 

- 

02 

0.349133 

- 

03 

8000 

0.398153 

+ 

00 

0.975688 

- 

01 

0.180545 

- 

01 

0.272562 

- 

02 

0.349942 

- 

03 

8500 

0.398352 

+ 

00 

0.976677 

- 

01 

0.180832 

- 

01 

0.273168 

- 

02 

0.350955 

- 

03 

9000 

0.398364 

+ 

00 

0.976648 

- 

01 

0.180802 

- 

01 

0.273070 

- 

02 

0.350742 

- 

03 

9500 

0.398434 

+ 

00 

0.976951 

- 

01 

0.180882 

- 

01 

0.273226 

- 

02 

0.350983 

- 

03 

10,000 

0.398096 

+ 

00 

0.975355 

- 

01 

0.180474 

- 

01 

0.272492 

- 

02 

0.349951 

- 

03 

10,500 

0.398492 

4 

00 

0.977236 

- 

01 

0.180969 

- 

01 

0.273427 

- 

02 

0.351356 

- 

03 

1 1 ,000 

0.398810 

4 

00 

0.978771 

- 

01 

0.181384 

- 

01 

0.274239 

- 

02 

0.352625 

- 

03 

11,500 

0.398946 

4 

00 

0.979429 

- 

01 

0.181569 

- 

01 

0.274619 

- 

02 

0.353244 

- 

03 

12,000 

0.398950 

+ 

00 

0.979378 

- 

01 

0.181534 

- 

01 

0.274509 

- 

02 

0.353007 

_ 

03 

12,500 

0.399227 

+ 

00 

0.980663 

- 

01 

0.181868 

- 

01 

0.275134 

- 

02 

0.353946 

- 

03 

13,000 

0.399804 

+ 

00 

0.983371 

- 

01 

0.182576 

- 

01 

0.276470 

- 

02 

0.355958 

- 

03 

13,500 

0.399532 

4 

00 

0.982048 

- 

01 

0.182215 

- 

01 

0.275758 

- 

02 

0.354837 

- 

03 

14,000 

0.400188 

4 

00 

0.985232 

- 

01 

0.183069 

- 

01 

0.277402 

- 

02 

0.357345 

- 

03 

14,500 

0.400262 

4 

00 

0.985532 

- 

01 

0.183136 

- 

01 

0.277507 

- 

02 

0.357477 

- 

03 

15,000 

0.400186 

4 

00 

0.985156 

- 

01 

0.183030 

- 

01 

0.277924 

02 

0.357129 

- 

03 

15,500 

0.400180 

4 

00 

0.985120 

- 

01 

0.183021 

- 

01 

0.277279 

- 

02 

0.357116 

- 

03 

16,000 

0.400043 

+ 

00 

0.984400 

- 

01 

0.182818 

- 

01 

0.276873 

— 

02 

0.356476 

— 

03 

o 

Waals  radii  of  1.40  and  1.18  A  for  oxygen  and 
hydrogen  atoms,  respectively. 


Volume  Profiles 

If  the  random  points  are  distributed  throughout 
the  molecular  interior  defined  by  van  der  Waals 
atomic  radii  and  then  constructed  from  the  corre¬ 
sponding  matrices  nD  from  which  average  nR/n\ 
are  extracted,  we  arrive  at  the  molecular  volume 
profile  of  a  molecule.  In  Table  II,  we  give  the 
results  for  the  first  10  powers  of  the  separations 
using  from  500  points  to  12,500.  As  we  see  from 
Table  II,  there  are  variations  between  the  individ¬ 
ual  calculations,  as  there  should  be  for  a  Monte 


Carlo  simulation,  but  these  are  not  excessive,  of 
the  order  of  2%,  thus  affecting  the  third  digit.  If 
higher  accuracy  is  warranted,  one  should  continue 
to  increase  the  number  of  points  or,  what  amounts 
to  the  same  from  a  statistical  point  of  view,  one 
should  start  afresh  and  repeat  the  calculations 
again  and  again.  To  increase  the  accuracy  by  the 
factor  of  10,  one  should  increase  the  number  of 
points  by  the  factor  of  100.  Hence,  the  individual 
computations  reported  here  in  steps  of  500  random 
points  have  to  be  extended  to  50,000  points  (four 
times  more  than  we  choose  to  carry  out)  in  order 
to  visibly  improve  the  accuracy  (by  fixing  the  next 
digit  in  the  individual  profile  components). 

Table  II  shows  the  cumulative  effect  of  Monte 
Carlo  simulation  which,  if  continued,  would  show 
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TABLE  II _ 

Profile  for  3-D  model  of  H20. 


1 

2 

3 

4 

5 

500 

0.365020 

+ 

00 

0.825372 

— 

01 

0.141649 

— 

01 

0.199341 

— 

02 

0.239725 

— 

03 

1000 

0.360752 

+ 

00 

0.806397 

- 

01 

0.136990 

- 

01 

0.191022 

- 

02 

0.227781 

- 

03 

1500 

0.359130 

+ 

00 

0.798093 

- 

01 

0.134674 

- 

01 

0.186407 

- 

02 

0.220512 

- 

03 

2000 

0.358998 

+ 

00 

0.797402 

01 

0.134524 

- 

01 

0.186187 

- 

02 

0.220249 

- 

03 

2500 

0.358495 

+ 

00 

0.795053 

01 

0.133968 

- 

01 

0.185277 

- 

02 

0.219100 

- 

03 

3000 

0.359606 

+ 

00 

0.799771 

- 

01 

0.135154 

- 

01 

0.187590 

- 

02 

0.222420 

03 

3500 

0.359646 

+ 

00 

0.800051 

- 

01 

0.135219 

- 

01 

0.187560 

- 

02 

0.222426 

- 

03 

4000 

0.359939 

+ 

00 

0.801490 

_ 

01 

0.135607 

- 

01 

0.188321 

- 

02 

0.223628 

- 

03 

4500 

0.360070 

+ 

00 

0.801950 

- 

01 

0.135694 

- 

01 

0.188424 

- 

02 

0.223701 

- 

03 

5000 

0.359460 

+ 

00 

0.799247 

- 

01 

0.135041 

- 

01 

0.187298 

- 

02 

0.222159 

- 

03 

5500 

0.359982 

+ 

00 

0.801470 

- 

01 

0.135570 

- 

01 

0.188201 

- 

02 

0.223380 

- 

03 

6000 

0.358818 

+ 

00 

0.796593 

- 

01 

0.134435 

- 

01 

0.186300 

- 

02 

0.220839 

- 

03 

6500 

0.358502 

+ 

00 

0.795258 

- 

01 

0.134140 

01 

0.185846 

- 

02 

0.220302 

- 

03 

7000 

0.359090 

4- 

00 

0.797519 

- 

01 

0.134630 

- 

01 

0.186616 

- 

02 

0.221270 

- 

03 

7500 

0.359712 

+ 

00 

0.800149 

- 

01 

0.135264 

- 

01 

0.187737 

- 

02 

0.222864 

- 

03 

8000 

0.359283 

+ 

00 

0.798336 

- 

01 

0.134831 

- 

01 

0.186982 

- 

02 

0.221807 

- 

03 

8500 

0.359194 

+ 

00 

0.797893 

- 

01 

0.134709 

- 

01 

0.186742 

- 

02 

0.221432 

- 

03 

9000 

0.359205 

+ 

00 

0.797873 

- 

01 

0.134692 

- 

01 

0.186690 

- 

02 

0.221319 

- 

03 

9500 

0.359247 

+ 

00 

0.797956 

- 

01 

0.134697 

- 

01 

0.186677 

- 

02 

0.221280 

- 

03 

10,000 

0.359758 

+ 

00 

0.800140 

- 

01 

0.135219 

- 

01 

0.1875806  - 

-  02 

0.222526 

- 

03 

10,500 

0.359155 

+ 

00 

0.797606 

- 

01 

0.134626 

- 

01 

0.186579 

- 

02 

0.221179 

- 

03 

11,000 

0.358664 

+ 

00 

0.795377 

- 

01 

0.134052 

- 

01 

0.185503 

- 

02 

0.219565 

03 

1 1 ,500 

0.359039 

+ 

00 

0.796976 

- 

01 

0.134434 

- 

01 

0.186163 

- 

02 

0.220476 

- 

03 

12,000 

0.358981 

+ 

00 

0.796720 

01 

0.134362 

- 

01 

0.186010 

- 

02 

0.220212 

- 

03 

12,500 

0.359006 

+ 

00 

0.796870 

- 

01 

0.134405 

- 

01 

0.186093 

- 

02 

0.220335 

- 

03 

6 

7 

8 

9 

10 

500 

0.252854 

_ 

04 

0.238194 

__ 

05 

0.203101 

06 

0.158384 

— 

07 

0.113898 

— 

08 

1000 

0.238349 

__ 

04 

0.222835 

- 

05 

0.188636 

- 

06 

0.146091 

- 

07 

0.104369 

- 

08 

1500 

0.228807 

- 

04 

0.212049 

- 

05 

0.177897 

- 

06 

0.136522 

- 

07 

0.96641 1 

- 

09 

2000 

0.228518 

- 

04 

0.211735 

- 

05 

0.177560 

- 

06 

0.136173 

- 

07 

0.963064 

- 

09 

2500 

0.227336 

- 

04 

0.210718 

- 

05 

0.176824 

- 

06 

0.135735 

- 

07 

0.961099 

- 

09 

3000 

0.231527 

04 

0.215295 

05 

0.181238 

- 

06 

0.139552 

- 

07 

0.991039 

- 

09 

3500 

0.231395 

04 

0.214998 

- 

05 

0.180804 

- 

06 

0.139048 

- 

07 

0.986067 

- 

09 

4000 

0.233012 

- 

04 

0.216899 

- 

05 

0.182794 

- 

06 

0.140925 

- 

07 

0.100219 

- 

08 

4500 

0.233013 

- 

04 

0.216817 

_ 

05 

0.182648 

- 

06 

0.140752 

- 

07 

0.100053 

- 

08 

5000 

0.231238 

__ 

04 

0.215039 

- 

05 

0.181061 

- 

06 

0.139469 

- 

07 

0.990999 

- 

09 

5500 

0.232614 

- 

04 

0.216371 

- 

05 

0.182191 

- 

06 

0.140321 

- 

07 

0.996749 

- 

09 

6000 

0.229756 

- 

04 

0.213577 

05 

0.179763 

- 

06 

0.138416 

- 

07 

0.100219 

- 

08 

6500 

0.229241 

- 

04 

0.213166 

- 

05 

0.179490 

- 

06 

0.138269 

- 

07 

0.982541 

- 

09 

7000 

0.230267 

- 

04 

0.214111 

- 

05 

0.180263 

- 

06 

0.138837 

- 

07 

0.986331 

- 

09 

7500 

0.232184 

- 

04 

0.216121 

- 

05 

0.182140 

06 

0.140418 

- 

07 

0.998488 

- 

09 

8000 

0.230930 

- 

04 

0.214822 

- 

05 

0.180938 

- 

06 

0.139413 

- 

07 

0.990784 

- 

08 

8500 

0.230440 

- 

04 

0.214269 

- 

05 

0.180387 

- 

06 

0.138922 

- 

07 

0.986814 

- 

09 

9000 

0.230249 

- 

04 

0.214004 

- 

05 

0.180073 

- 

06 

0.138597 

- 

07 

0.983845 

- 

09 

9500 

0.230181 

- 

04 

0.213912 

- 

05 

0.179970 

- 

06 

0.138495 

- 

07 

0.982948 

- 

09 

10,000 

0.231626 

- 

04 

0.215367 

- 

05 

0.181270 

- 

06 

0.139545 

- 

07 

0.990676 

- 

09 

10,500 

0.230106 

- 

04 

0.213879 

- 

05 

0.179979 

- 

06 

0.138534 

- 

07 

0.983465 

- 

09 

1 1 ,000 

0.228072 

04 

0.211659 

05 

0.177833 

- 

06 

0.136670 

- 

07 

0.968740 

- 

09 

11,500 

0.229131 

04 

0.212728 

05 

0.178794 

- 

06 

0.137450 

- 

07 

0.974523 

- 

09 

12,000 

0.228748 

- 

04 

0.212257 

- 

05 

0.178286 

- 

06 

0.136966 

_ 

07 

0.970389 

- 

09 

12,500 

0.228898 

- 

04 

0.212413 

- 

05 

0.178432 

- 

06 

0.137088 

- 

07 

0.971323 

- 

09 
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lesser  and  lesser  oscillations  in  subsequent  rows  of 
the  Table  II.  To  determine  that  the  accuracy  of 
computations  increased  with  an  increase  of  n,  the 
number  of  random  points  used,  the  entries  in 
Table  II  should  be  compared  with  those  in  Table 
III,  in  which  each  row  shows  the  same  profile 
components,  but  each  time  for  a  different  set  of 
500  random  points. 


Surface  Profiles 

To  obtain  a  characterization  of  the  molecular 
surface,  we  restrict  random  points  to  the  surface 
and  discard  all  points  that  correspond  to  the 
molecular  interior.  We  could  first  generate,  using 
pseudorandom  numbers,  the  x,  y  coordinates  of  a 
point  in  the  molecular  interior  using  the  same 
geometrical  restrictions  as  used  when  considering 
molecular  volume  profile.  However,  instead  of 
obtaining  the  z-coordinate  by  selecting  a  third 
random  number,  one  should  calculate  the  z- 
coordinate  so  that  it  is  on  the  molecular  surface. 
The  coordinate  system  could  be  oriented  so  that 
half  of  the  molecular  van  der  Waals  surface  of 
water  is  above  the  x,  y  plane  (z  >  0)  and  half  is 
below.  One  selects  one  or  the  other  half  of  the 
surface  depending  on  whether  the  random  number 
is  even  or  odd.  (Other  choices  can  be  considered, 
such  as  alternation  of  the  two  halves  of  the  surface.) 
Such  an  approach  appears  suitable  when  consider¬ 
ing  arbitrary  shapes.  However,  since  we  have  rep¬ 
resented  each  atom  by  a  sphere  (having  van  der 


TABLE  III _ 

Variations  in  the  computed  3-D  profile  based  each 
time  on  different  500  random  points. 


1 


500 

0.365020  +  00 

500 

0.356483  +  00 

500 

0.355885  +  00 

500 

0.358604  +  00 

500 

0.356482  +  00 

500 

0.365165  +  00 

500 

0.359886  +  00 

500 

0.361990  +  00 

500 

0.361113  +  00 

500 

0.353973  +  00 

Waals  radius  for  the  atom  considered),  for  our 
approach,  it  is  better  to  use  polar  coordinates  rather 
than  Cartesian  coordinates.  The  following  are  the 
steps  that  we  performed  in  order  to  obtain  uni¬ 
formly  distributed  random  points  over  the  molecu¬ 
lar  surface. 

1.  Each  atom  is  assigned  a  percent  surface  area. 
The  obtained  percentages  delineate  individ¬ 
ual  atoms,  and  when  the  first  random  num¬ 
ber  is  generated,  this  leads  to  the  selection  of 
one  of  the  atoms  in  a  molecule.  The  corre¬ 
sponding  probabilities  are  given  as  Pn  = 
R„/QL  Rj),  where  R{  are  the  van  der  Waals 
radii  or  atoms  in  a  molecule. 

2.  The  next  two  random  numbers  determine  the 
magnitude  of  the  Euler  angles  and  #  (in 
intervals  [0,  2tt]  and  [0,  tt],  respectively).  By 
knowing  the  angles  (f>  and  #,  we  calculate 
the  Cartesian  coordinates  of  the  point  on  the 
surface  of  a  sphere  using 

xn  —  rn  cos  <fi  sin  # 
yn  =  rn  sin  cj)  sin  # 
zn  —  rn  cos#. 

3.  Having  a  point  on  one  of  the  atomic  spheres, 
we  first  find  other  spheres  that  overlap  with 
the  one  considered.  For  example,  if  the  ran¬ 
dom  number  selects  one  of  the  hydrogen 
atoms  of  H20,  then  only  the  oxygen  atom 
overlaps  the  hydrogen  sphere,  while  the  other 
hydrogen  sphere  is  disjoint. 

4.  Next,  we  check  if  the  obtained  point 
( xn ,  yn,  zn)  is  inside  any  of  other  overlapping 
spheres.  If  it  is,  the  point  is  discarded  since  it 
belongs  to  the  overlapping  region;  if  not,  it  is 
added  to  the  list  of  random  points  represent¬ 
ing  the  molecular  surface. 

When  considering  the  molecular  volume,  the 
approach  is  modified  by  selecting  an  additional 
random  number  which  determined  the  r- 
coordinate  of  a  point  (in  the  domain  0  <  r  <  rn)  so 
that  it  is  inside  the  atomic  sphere. 

In  Table  IV,  we  list  the  derived  molecular  sur¬ 
face  profiles  for  H20  using  the  first  10  powers  of 
the  profile  matrices.  For  the  same  n,  we  obtain  a 
denser  distribution  of  the  points  on  surface  than 
was  the  case  with  the  density  of  the  points  when 
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TABLE  IV _ 

H20  molecule  surface  profile. 


1 

2 

3 

4 

5 

500 

0.510572  +  00 

0.149417  +  00 

0.313542  -  01 

0.517469  -  02 

0.707465  -  03 

1000 

0.512496  +  00 

0.150115  +  00 

0.315247  -  01 

0.520770  -  02 

0.707465  -  03 

1500 

0.512868  +  00 

0.150280  +  00 

0.315743  -  01 

0.521905  -  02 

0.714813  -  03 

2000 

0.512192  +  00 

0.149838  +  00 

0.314310  -  01 

0.518727  -  02 

0.709403  -  03 

2500 

0.511507  +  00 

0.149411  +00 

0.312797  -  01 

0.514962  -  02 

0.702193  -  03 

3000 

0.511120  +  00 

0.149214  +  00 

0.312281  -  01 

0.514020  -  02 

0.700874  -  03 

3500 

0.510536  +  00 

0.148884  +  00 

0.311281  -  01 

0.511884  -  02 

0.697292  -  03 

4000 

0.510601  +00 

0.148870  +  00 

0.311173  -  01 

0.511576  -  02 

0.696695  -  03 

4500 

0.511103  +  00 

0.149135  +  00 

0.311957  -  01 

0.511324  -  02 

0.699474  -  03 

5000 

0.511314  +  00 

0.149278  +  00 

0.312449  -  01 

0.514388  -  02 

0.701522  -  03 

6 

7 

8 

9 

10 

500 

0.828731 

-  04 

0.851881 

-  05 

0.782251  -  06 

0.650601  -  07 

0.495514  -  08 

1000 

0.835794 

-  04 

0.859940 

-  05 

0.790163  -  06 

0.657337  -  07 

0.500501  -  08 

1500 

0.838940 

-  04 

0.863988 

-  05 

0.794691  -  06 

0.661822  -  07 

0.504492  -  08 

2000 

0.831424 

-  04 

0.855130 

-  05 

0.785593  -  06 

0.653513  -  07 

0.497638  -  08 

2500 

0.820215 

-  04 

0.840456 

-  05 

0.768981  -  06 

0.636924  -  07 

0.482798  -  08 

3000 

0.818738 

-  04 

0.839112 

-  05 

0.768003  -  06 

0.636399  -  07 

0.482671  -  08 

3500 

0.813750 

-  04 

0.833141 

-  05 

0.761713  -  06 

0.630470  -  07 

0.477604  -  08 

4000 

0.812838 

-  04 

0.831970 

-  05 

0.760406  -  06 

0.629172  -  07 

0.476441  -  08 

4500 

0.816691 

-  04 

0.836553 

-  05 

0.765189  -  06 

0.633627  -  07 

0.480192  -  08 

5000 

0.819656 

-  04 

0.840181 

-  05 

0.769038  -  06 

0.637239  -  07 

0.483234  -  08 

calculating  the  molecular  volume.  Hence,  we  ex¬ 
pect  smaller  variations  in  the  computed  molecular 
profiles  for  the  same  n.  As  we  see  from  Table  IV, 
we  have  convergence  in  the  third  digit  as  we 
approach  5000  random  points. 


Discussion 

Although  we  illustrated  the  approach  on  one  of 
the  simplest  molecules,  H20,  nevertheless,  it  is 
clear  from  the  exposition  that  the  approach  is  gen¬ 
eral.  All  that  is  required  are  the  coordinates  of  the 
centers  for  atoms  in  a  molecule  and  the  van  der 
Waals  radii  of  all  atoms.  The  interpretation  of  the 
results  is  also  self-evident:  The  volume  profile 
gives  a  characterization  of  the  molecular  volume, 
while  when  the  random  points  are  restricted  to  the 
molecular  surface,  we  obtain  a  molecular  surface 
profile,  a  set  of  numbers  that  represent  a  character¬ 
ization  of  the  molecular  surface.  If  one  applies  the 
same  procedure  to  2-D  objects,  the  volume  profiles 
will  correspond  to  area  profiles  and  the  surface 
profile  will  correspond  to  contour  profiles.  Hence, 
we  can,  within  the  same  framework,  consider  the 


characterization  of  contours  representing  equal 
density,  or  equal  potential,  whether  these  are  rep¬ 
resented  in  a  plane  or  in  3-D  space. 

Do  the  components  in  the  derived  volume  and 
surface  profiles  have  other  structural  interpreta¬ 
tions?  The  average  point-to-point  distance  and  the 
averages  of  the  corresponding  powers  of  point-to- 
point  distances  represent  a  clear  mathematical  ex¬ 
planation  for  the  numerical  values  derived.  One 
should  also  not  forget  that  the  process  of  averag¬ 
ing  the  row  contribution  of  the  matrix  is  associated 
with  a  loss  of  information.  Hence,  there  is  no 
guarantee  that  the  derived  profiles  will  be  unique 
or  that  from  a  given  profile  one  could  reconstruct 
the  structure.  However,  as  has  been  illustrated 
with  the  characterizations  of  molecules  based  on 
graph  theoretical  invariants  (topological  indices), 
although  complete  reconstruction  need  not  be  fea¬ 
sible,  one  can  often  narrow  the  number  of  struc¬ 
tures  that  can  have  given  values  for  selected  topo¬ 
logical  indices  [23-26].  We  can  see  already  from 
Figure  1  that  if  the  profile  that  corresponds  to 
chrysene  is  given  (without  our  knowledge  to  which 
benzenoid  it  belongs)  that  we  can  immediately 
eliminate  benzphenanthrene  as  a  possible  candi- 
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date,  since  the  magnitudes  of  its  profile  compo¬ 
nents  are  too  small  to  correspond  to  the  data 
considered.  In  a  similar  fashion,  the  problem  of 
generation  and  reconstruction  of  structures  with  a 
specified  volume  profile  or  surface  profile  can 
be  expected  to  yield  a  short  list  of  candidate 
structures. 


Novel  Shape  Index 

The  volume  and  surface  profiles  when  com¬ 
bined  may  offer  additional  insight  on  the  structure 
considered.  If  a  molecule  is  spherical,  the  ratio  of 
the  volume-to-surface  (V/S)  is  simply  r / 3,  where 
r  is  the  radius  of  the  sphere.  However,  if  the  object 
is  cylindrical,  such  as  obtained  by  inserting  cylin¬ 
drical  parts  between  the  two  halves  of  a  sphere 
(see  Fig.  3),  the  ratio  V/S  will  vary  and  also  will 
be  a  function  of  L,  the  length  of  the  cylindrical 
parts.  For  example,  if  we  assume  that  L  =  r,  L  = 
2 r,  and  L  =  3r,  we  obtain  for  the  ratio  V/S 
0.388889 r,  0.4166667r,  and  0.4333333r,  respec¬ 
tively.  Clearly,  V/S  is  a  function  of  the  shape 
(here,  simply  a  function  of  the  length  of  the  elon¬ 
gated  structure). 

We  can  introduce  V/S  as  an  index  of  molecular 
shape  by  using  computed  molecular  profiles.  In 
Table  V,  we  list  for  the  computed  volume  and 
surface  profiles,  of  H20  the  corresponding  ratios 


V/S.  Again,  as  n,  the  number  of  points,  increases, 
the  oscillatory  behavior  of  the  computed  ratio  V /S 
should  attenuate,  although  in  view  that  both  the 
numerator  and  the  denominator  are  based  on  ran¬ 
dom  points,  the  convergence  of  the  ratio  V/S  is 
going  to  be  somewhat  slower.  Based  on  5000  points, 
we  obtain  for  V/S,  when  the  powers  used  are 
k  =  1-10,  0.6999,  0.5906,  0.4322,  0.3641,  0.3167, 
0.2821,  0.2559,  0.2354,  0.2189,  and  0.2051.  Although 
both  the  volume  and  the  surface  of  the  van  der 
Waals  model  of  H20  decrease  quickly  with  the 
increased  powers  of  k  (see  Tables  II  and  Table  IV, 
respectively),  the  quotient  V/S  tapers  off  rather 
slowly.  Thus,  while  V  decreased  by  108  when 
going  from  k  =  1  to  k  =  10,  the  ratio  has  de¬ 
creased  by  a  third.  This  suggests  that  the  new 
shape  descriptors  contain  a  considerable  amount 
of  structural  information,  as  they  should,  in  view 
of  the  immense  variability  of  shapes  for  molecules 
of  the  same  volume  and  the  same  surface. 


Open  Problems 

We  have  outlined  a  general  procedure  for  the 
construction  of  a  structurally  related  ordered  set  of 
descriptors  for  molecules  viewed  as  imbedded  in 
3-D  space  and  descriptors  for  the  molecular  sur¬ 
face  of  such  molecules.  The  present  work  can  be 
viewed  as  a  seminal  article,  even  though  it  has 


FIGURE  3.  Representation  of  van  der  Waals  contour  of  H20  molecule  by  1000  and  5000  random  points,  respectively. 
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HzO  with  5000  points 


H20  with  5000  points 


H20  with  5000  points 


(c) 


FIGURE  4.  Three  different  views  of  3-D  mode!  of  H20  molecule  based  on  van  der  Waals  radii  for  oxygen  and 
hydrogen  atoms  derived  by  using  5000  random  points. 


evolved  from  profiles  derived  when  atoms  are 
represented  as  vertices  of  molecular  graphs  or 
points  in  space.  The  attribute  "general"  is  indeed 
appropriate  because  not  only  are  there  no  restric¬ 
tions  on  the  size  and  conformations  of  molecules 
considered,  but  there  are  also  no  restrictions  on 
further  modifications  of  the  approach  when  in¬ 


stead  of  molecular  volume  and  molecular  surface 
other  spatial  functions  of  molecular  electron  den¬ 
sity  are  considered,  including  the  electron  density 
itself.  So,  on  the  one  hand,  random  points  can  be 
weighted  to  simulate  electron  density  and,  on  the 
other  hand,  points  can  be  distributed  along  com¬ 
puted  equipotential  surfaces.  Equally,  one  can  fo- 
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L=r 


L=2r 


L=3r 


FIGURE  5.  Cylinders  of  different  length  capped  by  hemispheres  representing  objects  of  similar  but  different  shapes. 


cus  attention  only  on  a  local  molecular  environ¬ 
ment  and  derive  local  molecular  descriptors  by 
suitable  partitioning  of  the  molecular  volume  and 
surface. 

There  are  a  number  of  open  problems,  technical 
or  not,  that  deserve  attention.  It  would  be  nice  to 
establish  a  preferred  number  of  random  points  per 
atom  which  would  suffice  in  typical  structure- 
activity  studies.  As  we  see  from  this  work,  about 
10 3  random  points  define  the  components  of  the 
profiles  to  about  1  part  in  103.  This,  of  course,  will 


TABLE  V _ 

The  quotient  V/S. 


1 

2 

3 

4 

5 

500 

0.7126 

0.5524 

0.4518 

0.3852 

0.3389 

1000 

0.7033 

0.5372 

0.4345 

0.3668 

0.3196 

1500 

0.7013 

0.5311 

0.4269 

0.3572 

0.3085 

2000 

0.7002 

0.5322 

0.4280 

0.3589 

0.3105 

2500 

0.6780 

0.5322 

0.4283 

0.3598 

0.3120 

3000 

0.7018 

0.5360 

0.4328 

0.3648 

0.3173 

3500 

0.7016 

0.5374 

0.4344 

0.3664 

0.3190 

4000 

0.7020 

0.5384 

0.4358 

0.3681 

0.3210 

4500 

0.7015 

0.5377 

0.4350 

0.3671 

0.3198 

5000 

0.6999 

0.5906 

0.4322 

0.3641 

0.3167 

6 

7 

8 

9 

10 

500 

0.3051 

0.2796 

0.2596 

0.2434 

0.2299 

1000 

0.2852 

0.2591 

0.2387 

0.2222 

0.2085 

1500 

0.2727 

0.2454 

0.2239 

0.2063 

0.1916 

2000 

0.2749 

0.2476 

0.2260 

0.2084 

0.1935 

2500 

0.2772 

0.2507 

0.2299 

0.2131 

0.1991 

3000 

0.2828 

0.2566 

0.2360 

0.2193 

0.2053 

3500 

0.2844 

0.2581 

0.2374 

0.2205 

0.2065 

4000 

0.2867 

0.2607 

0.2404 

0.2240 

0.2103 

4500 

0.2853 

0.2592 

0.2387 

0.2221 

0.2084 

5000 

0.2821 

0.2559 

0.2354 

0.2189 

0.2051 

suffice  for  most  data  reduction  problems  where 
experimental  accuracy  is  at  the  level  of  1%,  but, 
occasionally,  higher  precision  may  be  required.  In 
contrast  to  some  quantum  chemical  computations, 
we  are  not  (yet)  approaching  hardware  computa¬ 
tional  restrictions,  since  the  computations  increase 
with  N,  the  number  of  random  points,  approxi¬ 
mately  linearly. 

Of  immediate  interest  is  to  investigate  the  sensi¬ 
tivity  of  the  computed  profiles  to  geometrical  vari¬ 
ations  of  flexible  molecules  and  variations  between 
profiles  of  conformers,  rotational  or  positional. 
Another  topic  of  immediate  interest  concerns  the 
application  of  the  approach  to  structure- 
property-activity  studies.  These  are,  no  doubt,  to 
follow  soon,  because  our  model  for  the  first  time 
allows  researchers  to  quantify  the  visual  impres¬ 
sions  of  molecular  models  observed  either  on  com¬ 
puter  screens  in  virtual  space-time  or  when  "play¬ 
ing"  with  them  in  the  real  space- time,  and  these 
molecular  models  can  be  models  where  a  molecule 
is  represented  by  "ball-and-stick"  as  well  as  by 
space-filling  models,  both  of  which  are  often  com¬ 
bined  in  current  molecular  computer  graphics.  The 
former  correspond  to  the  atom-point  representa¬ 
tion,  and  the  second,  to  the  van  der  Waals  repre¬ 
sentation  of  atoms.  But,  also,  other  molecular  mod¬ 
els  can  be  considered  with  the  present  scheme.  For 
example,  it  should  be  possible  to  quantify,  i.e., 
"translate,"  an  image  into  a  numerical  characteri¬ 
zation,  such  as  those  of  the  strands  of  /3-sheets  or 
a-helices  in  proteins.  Moreover,  one  can  expect 
that  in  the  not  so  distant  future  software  may 
appear  which  will  "read"  stereoviews  of  molecules 
(including  complex  structures  like  proteins)  and 
produce  their  profile  characterizations,  either  as 
"bond"  profiles  or  space-filled  profiles.  This  task 
is  beyond  our  interest  (i.e.,  beyond  our  expertise) 
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and  we  do  not  even  wish  to  speculate  whether  it 
presents  a  challenge  or  not  for  an  experienced 
computer  software  expert.  But  those  looking  for  a 
challenge  may  consider  another  problem  that  has 
barely  been  “ touched"  in  mathematical  chemistry: 
The  inverse  structure  problem.  The  question  is: 
How  and  to  what  extent  one  can  reconstruct  a 
structure  given  the  collection  of  its  invariants?  The 
pioneering  work  in  this  direction  has  just  received 
attention  in  recent  years  by  considering  the  recon¬ 
struction  of  molecular  graphs  from  several  well- 
known  topological  indices  [23-26]. 
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ABSTRACT:  We  studied  the  electronic  structure,  electronic  excitation  energies,  charge 
distribution,  and  magnetic  properties  of  three  dinuclear  copper  complexes,  [{[L- 
Cu]202}2+,  L  =  1,4,7-triaza-cyclo  nonone]  (1),  [{[L-Cu]202}2+,  L  =  hydrotis-(pyrazolyl)- 
borate  (2)  and  {[(NH3)3-Cu]202}2+  (3)  in  two  core  isomers.  The  theoretical  model 
complex  3  is  sufficient  to  describe  the  qualitative  electronic  structure  and  related 
properties  of  1  and  2.  The  main  features  of  the  electronic  structure  are  similar  between 
the  three  systems;  the  electronic  excitation  energies  between  copper-  and  oxygen-based 
orbitals  are  insensitive  to  ligand  effects.  In  addition,  we  studied  the  energetics  of 
mononuclear  complexes  to  determine  the  mechanism  of  the  formation  of  1.  In  the 
suggested  mechanism  the  end-on  mononuclear  complex  forms  first  followed  by  an 
isomerization  between  the  trans-/x-l,2-02,  fx- j]2\v2-02  and  {}Jl-0)2  dinculear  isomers. 

©  1997  John  Wiley  &  Sons,  Inc.  Int  J.  Quant  Chem  65:  1077-1086,  1997 


Introduction 

Recently,  much  attention  has  focused  on  met- 
alloenzymes  responsible  for  cleaving  and 
forming  the  dioxygen  O — O  bond  as  well  as  to 
those  which  bind  dioxygen  reversibly.  Most  of 
these  enzymes  contain  one,  two,  or  more  metal 
centers  in  their  active  sites  with  common  core 
structures  in  their  oxygenated  form  of  either 
M2(  fJL-7]2:7]2-02)/  A,  or  M2(^-0)2,  B  where  M  = 
Mn,  Fe,  or  Cu.  A  tetranculear  mangenese  cluster 
Contract  grant  sponsor:  National  Research  Council,  Canada. 


which  consists  of  two  Mn2(  fi-0)z  units  (structure 
B)  is  the  active  site  of  photosystem  II,  which  is 
responsible  for  the  dioxygen  evolution  from  water. 
The  reverse  reaction,  the  cleavage  of  the  dioxygen 
O — O  bond  occurs  at  dinuclear  iron  and  copper 
sites  of  several  enzymes  with  a  core  structure  A. 


O 

/  \ 


A  B 


Based  on  the  structural  similarity  between  A 
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and  B,  it  seems  reasonable  that  the  general  mecha¬ 
nism  for  oxygen  evolution  and  dioxygen  activation 
in  metalloenzymes  involves  the  interconversion 
between  A  and  B  [1].  Tolman  and  co-workers 
recently  reported  the  synthesis  of  N-substituted 
derivatives  of  [{[L-Cu]202}2+,  L  =  1,4,7-triaza- 
cyclononane]  [2]  (1)  which  reversibly  bind  dioxy¬ 
gen  and  reversibly  break  the  O — O  bond  through 
the  isomerization  of  their  Cu2(  fi-r}2:r)2-02),  A,  and 
Cu2(  fJL-0)2,  B  core  structures  [3].  One  intriguing 
aspect  of  this  discovery  was  that  other  dinuclear 
copper  complexes  with  core  structure  A  have  simi¬ 
lar  structural,  magnetic,  and  spectroscopic  proper¬ 
ties,  but  their  chemistry  is  different.  Of  particular 
interest  are  the  N-substituted  derivatives  of  [{[L- 
Cu]202}2+,  L  =  hydrotis(pyrazolyl)borate]  (2)  syn¬ 
thesised  by  Kitajima  et  al.  [4]. 

In  the  recent  study  we  showed  that  the  binding 
energy  of  dioxygen  of  1  and  2  are  clearly  different, 
namely  -60  and  -184  kj/mol,  respectively  [5]. 
These  results  explained  why  1  binds  dioxygen 
reversibly  and  2  irreversibly.  Further  we  have 
shown  that  the  isomerization  energetics  of  1  and  2 
are  similar  in  spite  of  the  different  binding  energy. 
Chemical  properties  are  determined  by  the  total 
energies  of  the  complex  to  which  the  ligands  make 
significant  contribution  while  structural,  spectral, 
and  magnetic  properties  are  more  localized  and 
are  reminiscent  of  the  chromophore. 

The  question  remains,  therefore,  if  the  same 
level  of  theory  which  gave  different  total  binding 
energies  for  1  and  2  can  reproduce  the  similarities 
in  the  properties  related  to  the  electronic  structure 
of  the  Cu202  chromophore.  This  question  is  ad¬ 
dressed  in  the  present  study.  Previous  theoretical 
modeling  of  1  and  2  was  based  on  a  model  com¬ 
plex  {[(NH3)3 — Cu]202}2+  (3).  To  study  to  what 
extent  the  theoretical  models  can  be  simplified 
without  sacrificing  its  capability  to  reproduce 
properties,  we  carried  out  calculations  on  3.  In 
addition  we  address  the  binding  energetics  and 
properties  of  mononuclear  dioxygen  copper  com¬ 
plexes  related  to  1  to  examine  the  binding  mecha¬ 
nism  of  dioxygen. 

For  the  discussion  of  spectroscopic  properties  it 
is  also  of  interest  to  make  reference  to  observed 
spectral  properties  of  oxy-hemocyanin  and  oxy- 
tyrosinase  active  sites.  1  and  2  serve  as  inorganic 
mimics  of  the  active  sites  of  these  two  enzymes  in 
core  isomer  A  and  their  structural,  magnetic,  and 
spectral  properties  are  also  similar  [6,  7].  High-res¬ 
olution  X-ray  crystallography  indicates  that  the 
peroxide  is  perpendicularly  bridging  the  Cu-Cu 


axis  in  a  symmetrical  fashion,  which  gives  rise  to  a 
Cu2(  fjL-r}2:r}2-02)  core  structure.  Strong  antiferro¬ 
magnetic  coupling  between  the  two  d 9  copper 
centers  through  a  superexchange  mechanism  [8] 
leads  to  a  diamagnetic,  electron  paramagnetic  res¬ 
onance  (EPR)  silent  and  singlet  ground  state  [9]. 
The  0-0  stretching  frequency  is  significantly 
downshifted  to  725-760  cm-1  from  the  dioxygen 
frequency  and  is  close  to  a  typical  peroxide 
stretching  frequency.  There  are  two  characteristic 
absorption  bands  at  350  nm  (e  —  20,000  M-1 
cm"1)  and  550  nm  (s  =  1000  M"1  cm-1)  and  a 
feature  in  the  circular  dichroism  spectrum  at  480 
nm  (A£  =  +2.5  M"1  cm"1)  with  no  correspond¬ 
ing  feature  in  the  absorption  spectrum.  The  Cu-Cu, 
O-O,  and  Cu-O  distances  are  3.6,  1.41,  1.9  A, 
respectively. 

As  one  of  the  first  theoretical  studies,  Solomon 
and  co-workers  carried  out  self-consistent  scat¬ 
tered  wave  Xa  (SCF-Xa-SW)  calculation  [9]  to  in¬ 
terpret  the  peroxide  ->  Cu  charge-transfer  transi¬ 
tions  using  Slater's  transition-state  method  [10]. 
Most  recently  Tuczek  and  Solomon  presented  a 
valence  bond  configuration  interaction  (VBCI) 
model  to  explain  the  charge-transfer  energy  split¬ 
ting  and  excited-state  antiferromagnetism  in 
bridged  transition-metal  dimers  with  applications 
to  oxo-Hc  [9c]. 

Recently,  Bernardi  and  co-workers  reported  ab 
initio  and  density  functional  calculations  on  oxy- 
hemocyanin  models  including  3  [11]  and  on  the 
binding  process  of  triplet  oxygen  to  hemocyanin 
[12].  Eisenstein  and  co-workers  have  applied  ex¬ 
tended  Huckel  theory  (EHT)  [13]  to  a  model  sys¬ 
tem  with  two  copper  cations  ligated  by  six  imi- 
dazol  rings  and  bridged  by  an  oxygen  molecule 
(peroxide).  This  study  has  been  recently  extended 
by  Getlicherman  et  al.  to  the  eclipsed  arrange¬ 
ments  of  the  ligands  which  have  been  found  isoen- 
ergic  with  the  staggered  conformation  [14]. 

Theoretical  studies  of  the  Cu2(  /z-02)  complexes 
have  appeared  recently,  prompted  by  the  discov¬ 
ery  of  this  core  isomer.  Mahapatra  and  co-workers 
have  complemented  the  experimental  work  by 
minimum  basis  set  restricted  Hartree-Fock  (HF) 
calculation  and  by  broken  symmetry  Xa  method 
on  3  [2b].  The  broken  symmetry  calculations  con¬ 
verged  to  the  symmetrical  solution  and  the  calcu¬ 
lated  Cu-Cu  and  0-0  distances  and  0-0  vibra¬ 
tional  frequency  (from  HF  calculations)  were  in 
reasonable  agreement  with  the  experimental  data 
of  the  related  inorganic  complex.  Cramer  and  co¬ 
workers  described  the  core  isomerization  of  3  by 
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minimal  basis  set  HF  geometry  optimization  fol¬ 
lowed  by  CASPT2  single  point  calculations  [15]. 

In  this  study  we  apply  density  functional  theory 
(DFT)  at  generalized  gradient  approximation  level 
(GGA)  to  calculate  the  properties  of  1,  2,  and  3  (see 
Computational  Details).  Gradient-corrected  DFT 
has  been  shown  to  provide  results  in  excellent 
agreement  with  high-level  ab  initio  calculations  on 
simple  theoretical  model  systems  of  hemocyanin. 
Further  this  method  includes  dynamical  electron 
correlation  which  was  neglected  in  most  previous 
theoretical  studies  (HF,  CASSCF,  Xa/  EHT),  and 
which  was  shown  to  be  essential  for  the  proper 
description  of  these  systems.  Density  functional 
theory  has  been  a  remarkably  successful  tool  in 
modeling  energetics,  structures,  and  spectroscopy 
of  transition  metal  systems  [16]. 


Computational  Details 

The  reported  calculations  were  carried  out  us¬ 
ing  the  Amsterdam  Density  Functional  (ADF)  pro¬ 
gram  system  version  2.0.1  derived  from  the  work 
of  Baerends  et  al.  [17]  and  developed  at  the  Free 
University  of  Amsterdam  [18]  and  at  the  Univer¬ 
sity  of  Calgary  [19].  All  optimized  geometries  in 
this  study  were  calculated  based  on  the  local  den¬ 
sity  approximation  [20]  (LDA)  augmented  with 
gradient  corrections  to  the  exchange  [18b]  and 
correlation  [18c]  potentials.  The  geometries  were 
optimized  based  on  the  direct  inversion  of  itera¬ 
tive  subspace  for  geometry  (GDIIS)  technique  [21] 
using  natural  internal  coordinates  [22].  We  have 
combined  the  ADF  program  with  the  GDIIS  pro¬ 
gram  [23]  and  previously  implemented  the  skeletal 
internal  coordinates  [24].  The  internal  coordinates 
were  generated  by  the  INTC  program  [25]  and 
augmented  by  hand. 

The  atomic  orbitals  on  copper  were  described 
by  an  uncontracted  triple-^  Slater-type  orbitals 
(STO)  basis  set  [26],  while  a  double-^  STO  basis 
set  was  used  for  carbon,  nitrogen,  oxygen,  and 
hydrogen;  a  single-^  polarization  function  was 
used  on  all  atoms.  The  Is2  configuration  on  car¬ 
bon,  nitrogen,  and  oxygen  as  well  as  the  ls22s22p6 
configuration  of  copper  were  assigned  to  the  core 
and  treated  by  the  frozen-core  approximation.  A 
set  of  auxiliary  s,  p,  d,  /,  g,  and  h  STO  functions, 
centered  on  all  nuclei,  was  used  in  order  to  fit  the 
molecular  density  and  represent  the  Coulomb  and 


exchange  potentials  accurately  in  each  SCF  cycle 
[27]. 

DINUCLEAR  COMPLEXES 
Core  Structure 

For  the  successful  prediction  of  the  chro- 
mophore  properties,  it  is  essential  that  the  calcula¬ 
tion  reproduces  the  experimental  geometrical  pa¬ 
rameters.  The  calculated  Cu-Cu  distance  of  isomer 

o 

A  of  1  and  2  is  3.70  A  while  the  experimental 
counterpart  is  3.6(1)  A.  The  0-0  distances  of  1A 
and  2 A  are  1.47  and  1.46  A,  respectively,  while  the 
experimental  value  is  1.41  A.  Considering  the  chal¬ 
lenging  task  of  calculating  geometries  of  bridged 
dinuclear  metal  systems  and  the  large  experimen¬ 
tal  error  bars,  the  agreement  is  good  between  the¬ 
ory  and  experiment.  More  details  on  the  geometry 
is  given  in  Ref.  [5]. 

ELECTRONIC  STRUCTURE 

Solomon  et  al.  studied  the  electronic  structure 
and  electron  transition  energies  of  side-on  and 
end-on  bonded  dioxygen  complexed  to  copper 
monomer  and  dimer  complexes  by  broken  symme¬ 
try  SCF-Xa-SW  calculations  [9a].  One  important 
question  is  whether  the  electronic  structure  is  fully 
delocalized  or  a  localized  broken  symmetry  state. 
The  calculations  on  3A  by  Solomon  et  al.  con¬ 
verged  to  a  broken  symmetry  solution,  but  in  the 
series  of  dicopper  peroxo  complexes  considered, 
this  system  had  the  most  delocalized  electron  spin 
density  indicating  the  most  strongly  coupled  sys¬ 
tem.  We  found  that  broken  symmetry  gradient- 
corrected  density  functional  calculations  on  the  A 
isiomer  of  1,  2,  and  3  converged  to  the  symmetri¬ 
cal,  fully  delocalized  solution  on  all  model  sys¬ 
tems.  Our  results  indicate  that  the  ground  state  of 
oxy-hemocyanin  mimics  can  be  well  described  by 
a  restricted  single  determinant  molecular  orbital 
(MO)  model.  We  note  here  that  other  functionals 
may  yield  broken  symmetry  solutions  especially 
those  which  contain  mixed  Hartree-Fock  exchange 
terms. 

The  delocalized  MOs  are  significant  for  the 
technical  details  of  the  calculation,  but  it  does  not 
change  the  qualitative  orbital  descriptions  funda¬ 
mentally.  In  fact,  most  features  of  the  orbital  inter¬ 
actions  and  orbital  energy  levels  are  very  similar 
to  that  based  on  broken  symmetry  SCF-Xa-SW 
calculations.  Since  the  qualitative  orbital  interac- 
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tions  have  previously  been  discussed  by  Solomon 
et  al.  [9],  and  Bernardi  et  al.  [11]  on  the  basis  of 
SCF-Xa-SW  and  DFT  calculations,  respectively, 
here  we  restrict  our  discussion  to  the  orbital  corre¬ 
lation  between  the  two  core  isomers  and  to  the 
comparison  between  the  different  complexes.  Fig¬ 
ure  1  shows  the  correlation  diagram  between  the 
two  core  isomers  of  1,  2,  and  3.  These  orbital 
diagrams  are  shifted  so  that  the  highest  occupied 


molecular  orbital  (HOMO)  of  isomer  A  is  at  the 
same  level  for  all  three  models.  Our  symmetry 
designations  refers  to  C2h  symmetry  in  standard 
orientation,  z  axis  along  the  O — O  bond  (see  Fig. 
2). 

The  II*  orbital  is  strongly  stabilized  by  the 
bonding  interaction  with  the  copper-based  dxz  or¬ 
bital  which  forms  the  7Bg  of  3.  The  lowest  unoccu¬ 
pied  molecular  orbital  (LUMO)  is  the  antibonding 


1  2  3 
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combination  of  the  same  two  orbitals.  In  our  calcu¬ 
lation  the  HOMO  ( 9Bg  for  3)  is  a  primarily  II* 
orbital  with  some  antibonding  mixing  of  the  cop¬ 
per  based  dyz  orbitals,  The  corresponding  bonding 
orbital  (8Bg  of  3)  has  mainly  dyz  character.  Our 
description  of  the  HOMO  is  essentially  different 
from  that  by  the  SCF-Xa-SW  calculation  of  Solomon 
et  al  [9a].  They  found  that  the  dxy  orbitals  were 
stabilized  by  the  <x*  of  the  peroxide  which  forms 
the  HOMO  orbital  of  the  SCF-Xa-SW  wave  func¬ 
tion.  The  corresponding  9  Au  orbital  in  3  (not 
shown)  is  almost  1  eV  below  the  HOMO  of  our 
calculation.  Further,  our  calculation  did  not  show 
any  indication  that  the  peroxide  cr*  acts  as  a  tt 
acceptor  orbital  for  core  isomer  A  in  contrast  to 
such  finding  by  SCF-Xa-SW  calculations.  How¬ 
ever,  such  interaction  is  an  essential  feature  of  core 
isomer  B.  In  fact  the  most  striking  difference  be¬ 
tween  the  electronic  structures  of  A  and  B  isomers 
is  the  strong  stabilization  of  the  a*  orbital  in 
isomer  B.  All  MOs  containing  II*  orbitals  of  the 
peroxide  also  drop  in  energy,  while  the  16  Bu  and 
the  16  Ag  of  3  consisting  of  the  appropriate  combi¬ 
nations  of  Cu  based  s  orbitals  do  not  change  in 
energy. 

Comparing  the  MO  diagram  of  the  different 
systems,  3  is  a  good  qualitative  representation  of 
the  electronic  structure  of  the  inorganic  mimics. 
We  compared  the  linear  combination  coefficients 
of  1,  2,  and  3,  and  found  remarkable  similarities  in 
most  oxygen-  and  copper-based  orbitals.  Most  of 
the  differences  are  related  to  the  extent  ligand 
orbitals  participate  in  the  binding. 

Electronic  Excitation  Energies 

We  also  compared  the  electronic  excitation  ener¬ 
gies  between  the  three  model  systems.  Solomon 
et  al.  [9]  have  shown  that  configuration  interac¬ 
tions,  which  are  not  included  in  our  calculations, 
can  contribute  as  much  as  20,000  cm-1  to  the 
excitation  energies  of  systems  with  Cu202  core. 
Therefore,  our  goal  is  to  test  the  theoretical  system, 
3,  of  its  ability  to  reproduce  single-electron  excita¬ 
tion  energies  of  the  inorganic  systems  1  and  2.  We 
calculated  the  two  optically  allowed  77*  ->  Cu(II) 
ligand  to  metal  charge-transfer  transitions  of  iso¬ 
mer  A  of  1  and  3  based  on  the  sum  method 
developed  by  Ziegler  et  al.  [28]. 

The  calculated  7 r*  transition  for  1  and  3  are 
36,000  and  37,500  cm-1,  respectively.  Similarly,  the 
77-^  transition  energies  are  also  relatively  close  in  1 
and  3;  these  are  65,200  and  61,200  cm-1,  respec¬ 


tively.  However,  both  the  7r*  and  tt*  transition 
energies  are  about  twice  the  experimental  values 
of  18,000  and  28,600  cm-1,  respectively.  SCF-Xa- 
SW  transition  energies  obtained  by  Slater's  transi¬ 
tion  state  method  are  16,000  cm-1  and  66,800  cm-1 
for  the  7 r*  and  ir*  transitions,  respectively. 

Magnetic  Properties 

The  ground-state  Heisenberg  magnetic  coupling 
constant  (-2/)  for  a  strongly  coupled  dinuclear 
magnetic  system  is  simply  the  difference  between 
the  ground  singlet  and  triplet  state  energies  [9]. 
Experimentally,  hemocyanin  and  the  inorganic 
mimics  studied  here  are  all  diamagnetic.  Only  the 
lower  limit  of  the  -2/  constant  has  been  deter¬ 
mined  experimentally,  which  is  600  cm-1.  SCF-Xa- 
SW  calculations  predict  the  value  of  the  antiferro¬ 
magnetic  coupling  between  5675  and  11800  cm-1 
[9].  The  gradient-corrected  DFT  calculations  on  3 
by  Bernardi  et  al.  [11]  yields  13.34  kcal/mol  or 
4665  cm-1.  Our  calculated  -2/  values  of  1  and  3 
are  6163  and  3800  cm"1,  respectively,  while  the 
—  2  J  value  of  core  isomer  B  is  slightly  lower,  3387 
cm"1  for  1,  for  example. 

Charge  Analysis 

We  also  investigated  the  charge  transfer  from 
the  Cu(I)L  to  the  dioxygen  in  the  complex.  Table  I 
contains  the  calculated  partial  charges  on  Cu  and 
O  calculated  by  three  different  charge  analysis. 
The  Mulliken  population  analysis  is  the  most 


TABLE  I _ 

Partial  charges  on  Cu  and  O. 


/i-rj2:rj2-02 

Method 

1 

2 

3 

Cu 

Mulliken 

0.77 

0.72 

0.71 

Cu 

Hirshfeld 

0.42 

0.40 

0.42 

Cu 

Vornoi 

0.88 

0.95 

0.91 

0 

Mulliken 

-0.50 

-0.29 

-0.45 

0 

Hirshfeld 

-0.22 

-0.20 

-0.19 

0 

Vornoi 

-0.46 

-0.45 

-0.41 

CM 

O 

Method 

1 

2 

3 

Cu 

Mulliken 

0.91 

0.94 

0.93 

Cu 

Hirshfeld 

0.46 

0.46 

0.49 

Cu 

Vornoi 

1.03 

1.07 

1.08 

0 

Mulliken 

-0.79 

-0.77 

-0.76 

0 

Hirshfeld 

-0.35 

-0.31 

-0.30 

0 

Vornoi 

-0.70 

-0.68 

-0.63 
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straightforward,  but  its  physical  meaning  is  some¬ 
what  questionable.  The  Hirshfield  analysis  pro¬ 
vides  partial  atomic  charges  as  an  integral  of  the 
SCF  charge  density  over  space,  in  each  point 
weighted  by  the  relative  fraction  of  the  initial 
density  of  that  atom  in  the  total  initial  density  [29]: 

initial 

.atom (i)  _  f  «  atomQ')  gCF  (i) 

‘l  J  V  .initial  P  '  ylJ 

Lj  Fatom(;) 

; 

The  Voronoi  charge  analysis  consists  of  assigning 
the  charge  density  in  a  point  in  space  to  the  near¬ 
est  atom.  The  Voronoi  cell  of  an  atom  is  the  region 
in  space  closer  to  that  atom  than  any  other. 

In  spite  of  the  different  ligand  charges,  the  cal¬ 
culated  partial  charges  on  the  Cu  and  O  atoms  of 
the  complexes  are  very  similar  in  all  systems.  This 
is  in  agreement  with  what  is  expected  based  on  the 
similarity  in  the  core  electronic  structure.  The  Hir- 
shfeld  analysis  also  allows  one  to  directly  calculate 
the  amount  of  charge  transferred  from  one  frag¬ 
ment  to  the  other  by  applying  Eq.  (1)  to  fragments 
rather  than  atoms.  On  average  the  total  charge 
transferred  from  two  mononuclear  Cu(I)L  com¬ 
plexes  to  the  dioxygen  is  —0.65  and  —1.02  for  the 
jjL-7]2:r}2-02  and  the  fx- 02  isomer,  respectively, 
which  indicates  a  high  degree  of  covalency  in 
these  systems. 

MONONUCLEAR  SUPEROXO  COMPLEXES 

AND  THE  MECHANISM  OF  OXYGEN  BINDING 

It  is  difficult  to  study  experimentally  or  theoret¬ 
ically  the  binding  mechanism  of  dioxygen  in  the 
biological  system.  However,  there  is  adequate  ex¬ 
perimental  data  about  the  bonding  mechanisms  in 
inorganic  complexes,  and  therefore  it  is  of  interest 
to  study  the  binding  mechanisms  in  these  systems. 
Especially  interesting  are  the  systems  with  triaza- 
cyclononame  (TACN)  ligands.  To  gain  information 
about  the  mononuclear  intermediates  of  the  bind¬ 
ing  process,  we  optimized  the  geometry  of  both 
the  side-on  (4a)  and  end-on  (4b)  isomers  of  the 
mononuclear  superoxo  adducts  of  Cu(I)L+  sys¬ 
tems  with  TACN  ligands.  Experimentally  the  side- 
on  bonded  superoxo  adducts  were  observed  with 
tridentate  hydrotris  (3-£-butyl-5-isopropyl-pyrazo- 
lyl)borate  ligand  [30]  and  terminal  end-on  com¬ 
plexes  with  the  tetradentate  tris(2-pyrydylmeth- 
yl)amine  and  related  ligands  [31].  Based  on  these 
observations  and  other  related  studies  on  expects 
that  the  tridentate  TACN  ligand  also  gives  a  stable 


side  on  bonded  superoxo  adduct.  Experimentally, 
however,  such  complex  has  not  been  isolated 
probably  because  the  dinuclear  complex  is  the 
thermodynamic  product.  Theoretical  calculations 
make  it  possible  to  characterize  these  transient 
species  and  the  energetic  comparisons  provides 
insight  into  the  binding  mechanism. 

The  calculated  energetic  and  structural  informa¬ 
tion  of  the  optimized  structures  is  listed  in  Table  II 
and  the  optimized  geometries  are  shown  in  Figure 
3.  The  geometry  of  the  side-on  bonded  monomer  is 
similar  to  that  of  the  appropriate  part  of  the  dinu¬ 
clear  compound.  There  are  major  differences  in  the 
O — O  and  Cu — O  bond  lengths;  the  O—O  bond  is 
much  shorter,  and  the  Cu — O  bond  is  longer  than 
in  the  dinuclear  compound.  The  apical  Cu — N 
bond  length  is  between  that  of  the  Cu(I)L+ 
monomer  and  the  dinuclear  compound.  The  end-on 
superoxo  complex  has  the  shortest  O — O  bond  of 
1.28  A  and  the  Cu — O  bond  length  is  close  to  that 
in  the  jjl- 02  dinuclear  complexes.  The  conforma¬ 
tion  of  the  TACN  ligand  in  the  end-on  isomer  is 
different  from  that  of  the  dinuclear  adduct  and  the 
side-on  isomer.  There  is  one  short  and  two  long  Cu 
— N  bonds  similarly  to  the  monomeric  Cu(I)L  sys¬ 
tem.  However,  the  (N-)H  atoms  are  almost  eclipsed 
with  one  of  the  hydrogens  on  the  neighboring 
carbon  atoms  in  the  side-on  conformer.  This  con¬ 
formation  would  be  less  likely  with  bulky  N-sub- 


TABLE  II _ 

Geometrical  and  energetic  parameters  of 
mononuclear  02  adducts.3 


Cs  side-on 

Ct  side-on 

C1  end-on 

02  binding 

-89 

-94 

-81 

energy*3 

Roo 

1.310 

1.311 

1.281 

RcuOl 

2.071 

2.087 

1.896 

Rcu02 

2.071 

2.047 

2.786 

RcuNI 

2.170 

2.250 

2.064 

RcuN2 

2.088 

2.070 

2.193 

RcuN3 

2.088 

2.055 

2.135 

a 

82.7 

88.0 

81.6 

P 

118.0 

113.3 

7 

22.5 

23.3 

0-Cu-Cu-N1 

90.0 

95.6 

dihedral 

a  Distances  in  angstrom,  angles  in  degrees.  Numbering  of  N 
atoms:  1  is  apical,  2  and  3  are  equatorial  atoms,  a,  j8,  y  are 
NrCu-Ny  bending  angles  with  /,  /=  (2,3);  (1,2)  and  (1,3), 
respectively. 

b  Reaction  energy  for  02  +  Cu(l)L  ->  02CuL. 
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FIGURE  3.  Optimized  geometries  of  superoxo 
complexes. 


stituents.  Therefore,  our  calculations  most  likely 
overestimate  the  stability  of  the  side-on  conformer 
compared  to  the  inorganic  systems  they  represent. 

The  binding  energies  of  dioxygen  to  Cu(I)L 
monomers  are  -94  and  -81  kj/mol  for  4a  and 
4b,  respectively.  In  comparison,  the  binding  en¬ 
ergy  of  the  dinuclear  complex  1  is  -60  kj/mol. 
The  five  coordinate  side-on  adduct  (4a)  is  expected 
to  be  lower  in  energy  than  the  four  coordinate 
end-on  (4b)  structure  due  to  the  tendency  of  Cu(II) 
for  five  coordination.  However,  the  higher  stability 
of  the  mononuclear  vs.  dinuclear  system  is  in 
contrast  to  the  experimental  results,  where  the 
dinuclear  complex  was  the  thermodynamically 


stable  product.  This  disagreement  is  most  likely 
due  to  simplifications  in  our  computational  mod¬ 
els  and  assumptions  rather  than  error  in  the  the¬ 
ory.  Most  importantly,  the  N-substituents  may  re¬ 
verse  the  stability  of  the  mono-  and  dinuclear 
systems.  The  reversal  of  stability  upon  ligand  sub¬ 
stitution  between  mononuclear  vs.  dinuclear  cop¬ 
per  dioxygen  complexes  have  been  experimentally 
demonstrated.  Karlin  et  al.  [32]  studied  the  kinet¬ 
ics  and  thermodynamics  of  the  oxygenation  of 
copper®  complexes  with  tripodal  tetradentate  lig¬ 
ands:  tris[(2-pyridyl)methyl]amine  (TMPA)  and 
corresponding  ligands  with  one  (BPQA),  two 
(BQPA),  or  three  (TMQA)  2-quinolyl  groups  sub¬ 
stituting  for  the  2-pyridyl  donors.  The  dioxygen 
binding  enthalpies  of  the  dinuclear  complexes  were 
between  —50  and  -80  kj/mol,  and  the  relative 
thermodynamic  stability  of  the  mononuclear  and 
dinuclear  complexes  are  different  with  different 
ligands.  With  the  BPQA  ligand  the  mononuclear 
adduct  is  not  observed,  while  the  BQPA  system 
gives  the  mononuclear  complex  as  the  thermody¬ 
namically  stable  product  and  the  dinuclear  com¬ 
plex  as  an  intermediate.  The  binding  enthalpy  for 
the  mononuclear  adduct  was  measured  to  be  -  34 
kj /mol. 

Nonetheless,  the  characterization  of  an  end-on 
coordinated  mononuclear  copper  complex  with  tri- 
dentate  ligand  is  an  important  outcome  of  this 
study.  Experimentally  only  end-on  bonded  super¬ 
oxo  complexes  with  the  tetradentate  TMPA-analog 
ligands  with  6-pivalolylamine  substituents  on  the 
pyridyl  groups  have  been  structurally  character¬ 
ized  [31].  For  the  side-on  adduct  our  calculations 
predict  strong  dioxygen  binding,  but  this  species 
may  not  be  observed  due  to  large  kinetic  barrier  at 
low  temperature  and  unfavorable  entropies  at  high 
temperature.  Considering  these  findings  and  what 
is  experimentally  known  about  these  systems,  we 
can  comment  on  the  mechanism  of  dioxygen  bind¬ 
ing  of  CuTACN+  and  related  systems.  Experimen¬ 
tal  results  suggest  a  mechanism  which  involves 
the  rate-determining  formation  of  a  mononuclear 
superoxo  adduct  followed  by  the  trapping  of  the 
second  monomeric  Cu  complex  and  swift  equilib¬ 
rium  between  the  two  core  isomers.  The  mononu¬ 
clear  adduct  cannot  be  observed  as  a  stable  inter¬ 
mediate  and  with  access  of  02  the  kinetics  is  first 
order  in  Cu  concentration. 

Another  structural  finding  which  helps  in  the 
interpretation  of  the  binding  mechanism  is  the 
distortion  of  the  core  into  an  asymmetrical  position 
in  the  C2  symmetry  conformers  of  the  dinuclear 
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TACN  complex.  Further  distortion  along  the  same 
coordinate  would  result  in  a  trans-/u,-l,2-02  dinu- 
clear  complex.  This  finding  suggests  that  the  ji- 
t72:t72~02  and  trans-/x-l,2-02  isomers  may  also 
convert  to  one  another.  Experimentally  the  trans- 
/jl-1,2-02  dicopper  complexes  are  well  known  [33]. 
Further,  the  formation  of  a  trans-^-l,2-02  dinu- 
clear  system  from  a  side-on  complex  is  more  steri- 
cally  favored  compared  to  the  end-on  monomer. 
Therefore,  we  suggest  that  the  mechanism  of  02 
binding  occurs  through  the  formation  of  a  side-on 
mononuclear  adduct  with  triplet  ground  state,  fol¬ 
lowed  by  the  formation  of  a  trans-^-l,2-02  dinu- 
clear  complex  which  rearranges  into  a  ix-rf2:ri2-02 
complex.  The  trans-/x-l,2-02  dinuclear  complex 
may  be  the  transition  state  or  a  short-lived  tran¬ 
sient  form.  This  mechanism  is  shown  on  Scheme  1. 

The  binding  mechanism  of  the  inorganic  mimic 
may  be  significantly  different  from  that  of  the 
biological  system.  However,  one  key  element  of 
this  mechanism  is  the  interconversion  between  the 
trans-ju,-l,2-02  and  the  fi-rj2 :rj2-02  core  isomers, 
which  was  previously  suggested  by  Ling  et  al.  [34] 
on  the  basis  of  resonance  Raman  spectroscopy  of 
oxy-Hcs  and  corresponding  normal  coordinate 
analysis.  Further,  such  mechanism  is  consistent 
with  the  geometric  data  about  the  deoxy-Hc  and 
trans-^i-l,2-02  dinuclear  complexes.  The  typical 
Cu-Cu  distance  in  end-on  trans  fi- 1,2  copper-per¬ 


oxide  complex  [(Cu(I)L)2(02)]2+  is  436  A  [33], 
which  is  somewhat  shorter  than  4.6  A,  which  is 
found  experimentally  in  one  form  of  deoxyhemo- 
cyanin  from  Limulus  polyphemus  [35].  The  con¬ 
straints  set  by  the  protein  environment  clearly 
have  an  effect  on  the  final  mechanism  of  02  bind¬ 
ing.  Cruse  et  al.  [36]  have  shown  that  forcing  the 
two  copper  atoms  into  close  proximity  by  the 
presence  of  a  bridging  ligand  causes  a  dramatic 
enhancement  of  the  rate  of  reaction  with  02,  while 
the  reaction  enthalpy  does  not  change  significantly 
compared  to  the  nonbridged  system. 


Conclusions 

We  compared  the  electronic  structure,  electronic 
excitation  energies,  charge  distribution,  and  mag¬ 
netic  properties  of  two  dinuclear  copper  complexes 
and  the  corresponding  theoretical  model  complex 
with  ammonia  ligands.  We  found  that  all  proper¬ 
ties  which  are  characteristic  to  the  Cu202  chro- 
mophore  are  similar  in  all  three  systems.  Since  all 
molecular  orbitals  involved  in  the  electronic  transi¬ 
tions  are  either  copper  or  oxygen  based,  the  elec¬ 
tronic  excitation  energies  are  similar  for  the  three 
system.  The  ( —  2/)  magnetic  constant  is  the  most 
sensitive  property  to  ligand  effects  due  to  ligand 
participation  in  the  LUMO,  but  ligand  substitution 


SCHEME  1 .  Mechanism  of  dioxygen  binding  and  O—O  bond  cleavage. 
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does  not  change  the  diamagnetic  property  of  the 
complex. 

To  determine  the  mechanism  of  dioxygen  bind¬ 
ing  we  studied  the  mononuclear  complexes  with 
triazacyclononane  ligand.  The  mononuclear  com¬ 
plexes  were  found  more  stable  than  the  dinuclear 
one,  which  is  most  likely  related  to  the  simplifica¬ 
tion  in  the  model,  especially  the  effect  of  N-sub- 
stitution  could  reverse  the  stability  as  indicated  by 
experiment.  We  suggest  that  side-on  mononuclear 
complex  formation  is  followed  by  binding  another 
monomer  in  a  trans-/x-l,2-02  complex.  The  trans- 
jU-l,2-02  then  transforms  into  fi-r]2  :r]2-02  core  iso¬ 
mers  which  is  an  equilibrium  with  the  (jjl-0)2 
isomer. 
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ABSTRACT:  Recent  work  by  the  authors  on  the  calculation  of  local  solvent  dielectric 
constants  around  polyelectrolytes  using  the  Poisson-Boltzmann  approach  is  analyzed  in 
terms  of  the  effect  on  surface  potentials  and  counterion  concentrations.  Polyelectrolyte 
surface  geometry,  local  electric  fields,  and  counterion  distributions  contribute  to  the 
self-consistent  prediction  of  local  solvent  dielectric  constants.  For  an  all-atom  cell  model 
of  DNA  with  added  monovalent  salt  varying  from  0  to  0.5M,  the  Poisson-Boltzmann- 
determined  electrostatic  potential  increases  (negatively)  by  50-100%  upon  the  inclusion 
of  local  dielectric  constants.  This,  in  turn,  implies  that  hydronium  ion  concentrations  in 
the  major  and  minor  grooves  increase  by  about  0.65  and  0.35  pH  units,  respectively. 
While  counterion  concentrations  in  the  major  groove  change  only  slightly,  those  in  the 
minor  groove  increase  by  60-90%.  It  is  also  noted  that  while  the  local  dielectric  constant 
in  the  major  groove  monotonically  increases  away  from  the  surface  toward  the  bulk 
value  of  water  the  dielectric  constant  in  the  minor  groove  has  a  minimum  about  2  A  from 
the  surface  due  primarily  to  the  local  electric  field.  Certain  other  properties,  such  as  ionic 
and  dipole  first  passage  times,  are  affected  little  by  local  dielectric  constants  (less  than 
about  3%).  ©  1997  John  Wiley  &  Sons,  Inc.  Int  J  Quant  Chem  65:  1087-1093,  1997 
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Introduction 

Recent  calculations  of  thermodynamic  proper¬ 
ties  of  polyelectrolytes  in  solution  rely  largely 
on  the  application  of  the  Poisson-Boltzmann 
method  [1],  The  Poisson-Boltzmann  (PB)  approach 
belongs  to  that  class  of  numerical  techniques  that 
are  both  realistic  and  "computationally  conve¬ 
nient."  It  is  realistic  in  that  it  can  be  applied  to  a 
model  of  a  polyelectrolyte  such  as  DNA  in  which 
hundreds  and  possibly  thousands  of  atoms  are 
fixed  in  a  reasonable  "equilibrium"  configuration 
and  the  resulting  distribution  of  solvated  counteri¬ 
ons  determined.  It  is  "computationally  convenient" 
in  the  sense  [2]  that  calculations  on  systems  such 
as  those  above  can  be  performed  quickly  and  eas¬ 
ily  on  everyday  personal  computers. 

However,  despite  these  advantages  over  other 
popular  equilibrium  techniques  such  as  the  Monte 
Carlo  method,  several  approximations  inherent  in 
most  applications  of  the  PB  theory  limit  its  predic¬ 
tive  capability.  Most  often  mentioned  is  the 
traditional  neglect  of  ion  correlation  and  also 
the  cavalier  treatments  of  both  the  solvent  and  the 
solvated  counterions  in  terms  a  uniform  structure¬ 
less  dielectric  constant  and  vanishing  ionic  radii. 
The  effect  of  each  approximation  is  difficult  to 
estimate  but  comparison  with  well-defined  Monte 
Carlo  calculations  suggests  that  the  lack  of  ion 
correlation  yields  PB  values  for  monovalent  coun¬ 
terion  concentrations  at  the  surface  of  all-atom 
models  of  DNA  that  are  about  15-20%  low  [3-5]. 
Furthermore,  corrections  to  standard  PB  theory  by 
describing  counterions  with  nonzero  radii  [6]  but 
still  neglecting  ion  correlation  leads  to  poorer 
agreement  with  Monte  Carlo  data  [7].  Inclusion  of 
correlation  within  the  PB  framework  is  possible 
[8],  but  its  application  to  typical  all-atom  polyelec¬ 
trolyte  models  is  far  less  "convenient"  than  are 
standard  approaches. 

The  focus  of  considerable  recent  work  has  been 
on  removing  the  constraint  of  a  uniform  solvent 
dielectric  constant.  Most  modern  PB  algorithms 
can  be  traced  back  to  the  work  of  Warwicker 
[9-11],  which  can  be  easily  modified  to  incorpo¬ 
rate  a  variable  local  (i.e.,  solvent)  dielectric  "field" 
[12-15].  These  latter  works  have  relied  on  edu¬ 
cated  guesses  about  local  dielectric  values,  so 
estimates  of  errors  involved  in  specific  solvent 
treatments  are  qualitative  at  best.  To  amend  this 


situation,  the  present  authors  presented  a  method 
by  which  local  solvent  dielectric  constants  can  be 
calculated  within  the  PB  approximation  [16].  Com¬ 
parison  with  available  experiment  data  shows  this 
self-consistent  PB  extension  to  be  as  accurate  as 
the  PB  method  itself,  i.e.,  it  yields  local  dielectric 
constants  that  are  about  15-20%  too  large  at  the 
surface  of  an  all-atom  representation  of  DNA.  The 
purpose  of  the  present  article  is  to  analyze  the 
effect  of  including  this  nonuniform  dielectric  con¬ 
stant  treatment  upon  the  resulting  PB  local  poten¬ 
tials  and  counterion  concentrations.  The  following 
section  gives  a  very  brief  overview  of  the  algo¬ 
rithm.  Numerical  results  and  conclusions  drawn 
from  them  are  presented  in  the  final  section. 


The  Calculation  of  Local  Dielectric 
Constants 

The  extension  of  the  original  Debye-Huckel 
method  [17]  (sometimes  called  the  linear  PB  ap¬ 
proximation)  to  ionic  distributions  around  poly¬ 
electrolytes  involves  describing  the  electrostatic 
potential  <f>( r)  at  points  in  the  polyelectrolyte  en¬ 
vironment  by  the  Poisson  equation 

V  •  [  e(r)V4>(r)]  =  -47rp(r),  (1) 

where  e(r)  denotes  the  spatially  varying  dielectric 
constant  and  p(r)  denotes  the  charge  distribution 
of  both  the  polyelectrolyte  and  its  ionic  environ¬ 
ment.  The  charge  distribution  may  be  expressed  as 
a  sum  of  individual  ionic  species  distributions: 

p(r)  =  £  (2) 

k 

where  k  signifies  a  particular  ionic  species.  The 
(normalized)  distribution  of  each  species  is  deter¬ 
mined  from  the  electrostatic  potential  through 
Boltzmann's  equation: 

pk( r)  =  Nk  exp[  —f}zk<f>(r)]  j j  drexp[  -/3z^(r)], 

(3) 

where  species  k  has  charge  zk  and  number  Nk 
with  j8  =  1  /kBT.  The  solution  to  coupled  Eqs. 
(l)-(3)  for  all-atom  models  of  a  polyelectrolyte 
such  as  DNA  is  most  often  accomplished 
by  operating  on  a  three-dimensional  grid.  The 
finite-element  representation  of  Eq.  (1)  on  a  non- 
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Cartesian  grid  is  given  by 


4>i  = 


4  TTViPi  +  'H<l>j£ijSij/dij) 

i 


UCifrj/d,,), 

i 

(4) 


where  spatial  locations  are  now  indicated  by  vol¬ 
ume  element  index  i  and  the  sum  over  element  j 
is  only  over  those  elements  bordering  element  i.  In 
Eq.  (4),  i7f  is  the  volume  of  element  i;  e2;  =  (e-  + 
ep/2,  the  arithmetic  average  of  the  local  dielectric 
constants  of  elements  i  and  j;  Sij,  their  shared 
surface  area;  and  d^,  the  distance  between  their 
centers.  (Simple  Cartesian  grids  have  Sf;-  and  di] 
constant.)  The  self-consistent  iterative  solution  to 
Eqs.  (2)-(4)  for  the  non-Cartesian  grid  used  by  the 
authors  with  local  dielectric  constants  fixed  at  pre¬ 
determined  values  (e.g.,  e  =  4  for  volume  ele¬ 
ments  representing  the  DNA  and  e  =  78.5  for 
bulk  water  with  counterions)  was  described  else¬ 
where  [15]. 

As  mentioned  above,  previous  solutions  to  the 
finite-element  PB  equations  have  limited  the  de¬ 
scription  of  local  dielectric  constants  to  fixed  pre¬ 
determined  values  [12-15].  It  is,  however,  well 
known  that  the  relatively  high  bulk  value  of  the 
dielectric  constant  of  water  is  due  to  the  tetrahe¬ 
dral  nature  of  nearest-neighbor  bonding  [18,  19], 
which  would  be  interrupted  at  the  surface  of  a 
macromolecule.  Furthermore,  the  presence  of  high 
electric  fields  [20]  and  large  counterion  concentra¬ 
tions  [21]  at  the  surface  of  polyelectroytes  would 
also  be  expected  to  affect  local  solvent  dielectric 
constants.  While  surface  interruption  of  tetrahedral 
bonding  is  most  easily  regarded  as  a  "boundary 
condition"  independent  of  environmental  parame¬ 
ters  such  as  ion  type  or  concentration,  effects  due 
to  electric-field  strength  and  counterion  distribu¬ 
tion  need  to  be  determined  self-consistently  within 
the  PB  iterative  algorithm.  (Ideally,  the  effect  of 
ionic  concentration  would  also  be  taken  into  ac¬ 
count  in  determining  the  equilibrium  structure  of 
polyelectrolytes  such  as  DNA.)  The  specific  inclu¬ 
sion  of  these  surface,  electric  field,  and  ionic  distri¬ 
bution  effects  into  the  PB  approach  with  a  view 
toward  comparing  predicted  local  solvent  dielec¬ 
tric  constants  with  experimental  data  was  the 
subject  of  a  previous  article  [16].  Here,  we 
are  interested  in  how  these  effects  alter  PB- 
determined  potentials  and  counterion  distributions 
near  the  surface  of  the  DNA. 


To  calculate  the  result  of  bond  disruption  at  the 
surface  of  a  macromolecule,  we  begin  with  the 
original  Kirkwood  result  for  the  bulk  dielectric 
constant  of  water  [18]: 

e  =  2 17 r/3NAgfi2  +  n2/l,  (5) 

where  the  g-factor  accounting  for  dipole  correla¬ 
tion  is  given  by 

g  =  l  +  z<cos0>,  (6) 

with  z  being  the  number  of  nearest  neighbors  of  a 
central  molecule,  (cos0>  denoting  the  average 
over  nearest  neighbors  of  the  angle  between  the 
molecular  dipole  vectors,  and  /x  representing  the 
external  moment  of  a  spherical  sample  with 
macroscopic  index  of  refraction  n.  For  water  at  298 
K,  Kirkwood  used  g  =  2.64  and  fx  =  2.15  D,  giv¬ 
ing  a  bulk  dielectric  value  of  e  =  63.  Haggis  et  al. 
[22,  23]  extended  the  Kirkwood  calculation  by  ex¬ 
plicitly  including  the  tetrahedral  bonding  of  water. 
The  application  and  modification  of  their  theory  to 
account  for  tetrahedral  bond  disruption  at  surfaces 
is  presented  in  [16].  The  result  is  that  the  correla¬ 
tion  and  total  moment  of  a  spherical  cluster  can  be 
expressed  as  a  sum  over  cluster  sizes: 

SM2  =  E  «»£»(/)[  Am(/)]2'  (7) 

m 

where  the  central  molecule  of  a  cluster  is  bonded 
to  m  other  water  molecules,  with  m  ranging  from 
0  to  4.  In  Eq.  (7),  nm  is  the  relative  population  of 
clusters  with  bond  number  m  and  gm  and  fxm  are 
determined  by  linearly  interpolating  between  an 
isolated  water  molecule  (g0  =  1,  /x 0  =  1.85  D)  and 
an  ice  cluster  (g4  =  2.91,  /x 4  =  2.45  D).  Surface 
effects  are  represented  by  the  variable  /,  which 
denotes  the  fraction  of  the  volume  of  a  sphere  (or 
cluster)  of  radius  R  accessible  to  the  solvent.  For  a 
sphere  at  a  distance  w  from  a  plane  boundary,  this 
fraction  is  given  by 

f(w )  =  0.5  +  0.75XV/R  -  0.2 5(w/R)3.  (8) 

Note  that  in  [16]  the  phrase  prior  to  the  description 
of  Eq.  (12),  which  defines  f(w),  should  read  "...  we 
equate  the  probability  that  one  or  more  cluster 
branches  is  not  shortened  to  that  fraction  of  the 
volume  of  a  sphere  of  radius  R  that  is  accessible  to 
the  cluster."  For  more  realistic  all-atom  models  of 
macromolecules,  the  fraction  f{  at  volume  ele¬ 
ments  near  the  surface  (within  about  6  A)  needs  to 
be  determined  numerically.  At  298  K,  Eqs.  (5)  and 
(7)  may  be  combined  and  simplified  to  give  the 
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local  dielectric  constant  at  element  i  (in  the  pres¬ 
ence  of  a  boundary)  [16]: 

e?  =  17.5(1  +  1.7/, 1/6  +  05/y 3  +£ 

+  0.24/;1/2)  +  0.8.  (9) 

The  second  effect  that  we  want  to  account  for  is 
the  diminution  of  the  dielectric  constant  of  water 
due  to  dielectric  saturation  at  high  electric-field 
strengths.  Booth  [20]  derived  an  expression  that 
reproduces  the  bulk  value  in  the  absence  of  a  field. 
Her  result  may  be  combined  with  Eq.  (9)  and 
written  [16] 

e,B£  =  1.8  +  (eB  -  1.8)  1X0.08 E,.),  (10) 

where  e,B  is  the  local  dielectric  constant  due  to 
boundary  effects  only;  L(x),  the  Lange vin-type 
function  3[coth(x)  —  l/x]/x;  and  Eif  the  electric- 
field  strength  in  mV/A  at  element  i  (at  298  K). 
Electric-field  strengths  are  easily  determined  from 
potentials  using  the  relation 

Ef-  LH-<t>jVdij]2,  (ID 

i 

where  the  sum  is  only  over  neighboring  volume 
elements. 

The  third  effect  to  be  considered  is  the  dielectric 
decrement  of  water  in  the  presence  of  ions.  This 
type  of  dielectric  saturation  [21]  results  from  the 
orientational  freezing  of  water  molecules  in  the 
solvation  shell  of  an  ion  and  depends  on  the  size 
of  the  ion  as  well  as  on  the  number  of  water 
molecules  the  ion  "freezes."  For  monovalent  coun¬ 
terions  represented  by  sodium,  the  local  solvent 
dielectric  constant  is  given  by  [16] 

6/  =  e,B£[(l  -  0.960, )/(l  +  0.50,)],  (12) 

where  0,  is  the  local  volume  fraction  of  counteri¬ 
ons  (including  frozen  water  molecules), 

0,-  =  c,[Na  +  ]/(c,[Na+]  +  11.2).  (13) 

c,[Na  +  ]  is  the  local  molar  concentration  of  counte¬ 
rions  in  element  i,  and  e,B£,  the  local  dielectric 
constant  due  to  boundary  and  electric-field  effects. 
Equation  (12)  is  used  in  Eq.  (4)  and  the  charge 
distributions  [Eq.  (3)]  and  potentials  [Eq.  (4)]  are 
iterated  until  self-consistency.  The  result  was  con¬ 
sidered  converged  when  the  sum  of  the  average 
rms  values  of  the  total  charges,  potentials,  and 
field  strengths  at  all  environmental  points  between 
consecutive  iterations  is  less  than  10-6. 


Results  and  Conclusions 

The  polyelectrolyte  system  that  we  consider  is 
identical  to  that  used  in  [16].  An  all-atom  model  of 
B-DNA  was  contained  within  a  cylindrical  cell  of 
100  A  radius  with  added  salt  in  concentrations  of 
0,  0.1,  0.2,  and  0.5M.  As  before,  environmental 
points  were  placed  in  layers  contouring  the  van 
der  Waals'  surface  of  the  macromolecule.  The  data 
in  Tables  I  and  II  are  average  potentials  and  cation 
concentrations  by  groove  for  calculations  with  [Eq. 
(12)]  and  without  (e,  =  78.5)  the  inclusion  of  the 
dielectric  effects  discussed  above.  The  results  for 
average  local  dielectric  constants  in  the  grooves  is 
given  in  Table  III.  In  all  cases,  the  dielectric  con¬ 
stant  of  cells  partitioning  the  DNA  was  set  to  4 
and  a  temperature  of  298  K  assumed.  In  their 
closest  approach  to  the  surface  of  DNA,  solvated 
sodium  counterions  were  modeled  as  1.4  A  radius 
spheres. 

Table  I  shows  that  the  effect  of  including  a 
variable  dielectric  constant  upon  the  electrostatic 
potential  (relative  to  that  at  the  cell  boundary)  is 
confined  mainly  to  the  first  environment  layer  at 
the  surface  of  DNA.  While  limited  in  radial  extent, 
the  deviations  range  from  the  fixed  dielectric  con¬ 
stant  potentials  by  50-100%.  The  size  of  the  effect 
increases  with  added  salt,  due,  of  course,  to  an 
increase  in  surface  counterion  concentrations  and 
accounted  for  by  Eq.  (12).  The  effect  is  only  moder¬ 
ately  sensitive  to  angular  placement  around  the 
surface  of  the  DNA. 

While  counterions  are  absent  from  this  inner 
Helmholtz  layer,  the  increase  in  surface  potential 
may  be  relevant  to  the  calculation  of  p  Kn  values 
at  specific  protonation  sites.  Local  concentrations 
of  solvated  protons  are  readily  calculated  from  the 
potentials  given  in  Table  I  through  p[H],  =  pH  + 
0.0170,  (mV)  [6].  For  a  pH  value  of  7  at  the  cell 
boundary  (0  =  0),  average  p[H]  values  in  the  sec¬ 
ond  layer  (2.1  A  from  the  surface)  obtained  by 
including  local  dielectric  effects  are  only  about  0.1 
unit  lower  than  without  the  effects  giving  values 
of  4.4,  5.6,  5.8,  and  6.1  at  added  salt  concentrations 
of  0,  0.1,  0.2,  and  0.5M,  respectively,  with  angular 
deviations  less  than  0.2  units.  However,  the  much 
steeper  potential  gradient  at  the  surface  in  the 
presence  of  local  dielectric  effects  implies  that  a 
reliable  distance  of  the  closest  approach  to  the 
DNA  surface  for  the  proton  (as  a  hydronium  ion) 
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TABLE  I _ 

Average  potentials  (mV,  relative  to  the  potential  at  the  cell  boundary)  in  the  major  and  minor  grooves 
and  elsewhere  of  B-DNA  in  the  presence  of  various  concentrations  of  added  salt  with  [Eq.  (12)] 
and  without  (e  =  78.5)  the  inclusion  of  the  algorithm  for  local  dielectric  constants;  distances  are 
measured  relative  to  the  van  der  Waals  surface  of  DNA. 


Fixed 

solvent 

Potential  (mV) 
dielectric  constant 

Variable 

solvent 

dielectric 

constant 

R  (A) 

0  M 

0.1  M 

0.2  M 

0.5/W 

0  M 

0.1  M 

0.2  M 

0.5/W 

0.7 

Major  groove 

-205 

-139 

-124 

-105 

-319 

-255 

-241 

-225 

Minor  groove 

-145 

-77 

-62 

-42 

-177 

-109 

-93 

-73 

Elsewhere 

-173 

-107 

-92 

-74 

-254 

-190 

-176 

-160 

Average 

-187 

-121 

-106 

-87 

-280 

-215 

-202 

-186 

2.1 

Major  groove 

-149 

-83 

-68 

-50 

-152 

-86 

-71 

-52 

Minor  groove 

-141 

-74 

-59 

-40 

-157 

-88 

-72 

-52 

Elsewhere 

-141 

-76 

-61 

-43 

-149 

-83 

-69 

-50 

Average 

-145 

-79 

-64 

-46 

-151 

-85 

-70 

-51 

3.5 

Major  groove 

-124 

-60 

-46 

-30 

-120 

-57 

-43 

-27 

Minor  groove 

-127 

-62 

-47 

-30 

-129 

-63 

-48 

-31 

Elsewhere 

-123 

-59 

-45 

-28 

-122 

-58 

-44 

-28 

Average 

-124 

-60 

-46 

-29 

-123 

-58 

-44 

-28 

5.9 

Average 

-98 

-37 

-25 

-13 

-93 

-34 

-23 

-11 

9.3 

Average 

-79 

-24 

-14 

-5 

-75 

-21 

-12 

-4 

13.0 

Average 

-62 

-13 

-7 

-2 

-59 

-12 

-6 

-1 

TABLE  II _ 

Average  cation  concentrations  ( M )  near  B-DNA  as  described  in  Table  I. 


Concentration  (M) 


Fixed 

solvent 

dielectric 

constant 

Variable 

solvent 

dielectric 

constant 

R  (A) 

0  M 

0AM 

0.2  M 

0.5/W 

0  M 

0AM 

0.2M 

0.5/W 

2.1 

Major  groove 

3.12 

3.53 

3.76 

4.27 

3.16 

3.59 

3.82 

4.33 

Minor  groove 

1.66 

1.93 

2.08 

2.44 

3.04 

3.37 

3.55 

3.95 

Elsewhere 

1.82 

2.16 

2.36 

2.84 

2.64 

3.06 

3.30 

3.84 

Average 

2.38 

2.74 

2.95 

3.43 

2.91 

3.33 

3.56 

4.07 

3.5 

Major  groove 

0.92 

1.17 

1.32 

1.67 

0.75 

0.99 

1.13 

1.47 

Minor  groove 

0.97 

1.19 

1.32 

1.64 

1.04 

1.26 

1.39 

1.72 

Elsewhere 

0.88 

1.09 

1.23 

1.55 

0.86 

1.07 

1.20 

1.52 

Average 

0.91 

1.13 

1.27 

1.61 

0.85 

1.07 

1.20 

1.53 

5.9 

Average 

0.32 

0.46 

0.56 

0.84 

0.26 

0.41 

0.50 

0.78 

9.3 

Average 

0.15 

0.27 

0.36 

0.63 

0.13 

0.24 

0.33 

0.60 

13.0 

Average 

0.07 

0.18 

0.27 

0.54 

0.06 

0.17 

0.26 

0.54 
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TABLE  III _ 

Average  local  dielectric  constants  in  the  major  and  minor  grooves  and  elsewhere  of  B-DNA  in  the  presence 
of  various  concentrations  of  added  salt  calculated  using  Eq.  (12)  in  the  Poisson  -  Boltzmann  equation. 


R(A) 

0  M 

Dielectric  constant 

0.1  M 

0.2  M 

0.5  M 

0.7 

Major  groove 

20 

20 

20 

20 

Minor  groove 

46 

46 

45 

45 

Elsewhere 

23 

23 

23 

23 

Average 

23 

23 

23 

23 

2.1 

Major  groove 

31 

29 

29 

27 

Minor  groove 

29 

28 

27 

26 

Elsewhere 

34 

33 

32 

30 

Average 

32 

31 

30 

28 

3.5 

Major  groove 

59 

58 

57 

56 

Minor  groove 

41 

40 

39 

38 

Elsewhere 

55 

54 

53 

52 

Average 

54 

53 

53 

51 

5.9 

Average 

71 

71 

70 

69 

9.3 

Average 

76 

76 

75 

73 

13.0 

Average 

77 

77 

76 

73 

o 

is  crucial.  Assuming  that  this  distance  is  1.4  A 
leads  to  the  above  p[H]  values.  It  is  more  likely 
that  solvated  protons  can  approach  the  surface 
closer  than  this.  If  a  distance  of  1  A  is  assumed, 
surface  proton  concentrations  are  then  much  more 
sensitive  to  local  dielectric  effects  as  well  as  show¬ 
ing  a  much  greater  angular  dependence.  Approxi¬ 
mate  local  p[H]  values  may  be  (quadratically)  in¬ 
terpolated  from  the  potentials  at  1.5  A  from  the 
surface  to  give  values  in  the  major  groove  of  3.5, 
4.6,  4.9,  and  5.2  at  added  salt  concentrations  of  0, 
0.1,  0.2,  and  0.5M,  respectively.  The  values  aver¬ 
age  0.65  units  lower  than  when  a  constant  solvent 
dielectric  constant  is  used.  Minor  groove  p[H]  val¬ 
ues  average  0.75  units  higher  than  major  groove 
values,  an  observation  in  the  opposite  direction  of 
that  found  previously  [6].  This  is  due  to  the 
specific  sizes  and  concentrations  of  counterions 
assumed  in  the  two  studies  as  well  as  to  the 
incorporation  of  activity  corrections  to  the  stand¬ 
ard  PB  approach.  This  emphasizes  the  importance 
of  local  surface-counterion  geometries  and  condi¬ 
tions  in  determining  specific  thermodynamic  prop¬ 
erties,  such  as  local  p[H],  which,  for  DNA,  may 
provoke  a  selective  biochemical  response  [24].  If 
future  PB  applications  are  to  focus  on  surface-de¬ 
termined  properties  of  polyelectrolytes,  as  indeed 
recent  work  on  pK/s  seems  to  indicate,  then  a 
much  better  analysis  and  improvement  of  the  PB 
theory  at  such  surfaces  is  warranted. 


Counterion  concentrations  consistent  with  the 
potentials  of  Table  I  are  given  in  Table  II.  The 
largest  deviations  are  observed  in  the  counterion 
layer  closest  to  the  surface,  which  corresponds  to 
the  second  layer  listed  in  Table  I.  Of  particular 
interest  is  the  sensitivity  of  minor  groove  counte¬ 
rion  concentrations  to  variable  dielectric  effects  as 
well  as  the  constancy  of  major  groove  concentra¬ 
tions.  With  counterion  concentrations  already  satu¬ 
rating  the  major  groove,  inclusion  of  a  variable 
local  dielectric  constant  does  not  increase  values 
there  much.  However,  elsewhere  at  the 
surface — particularly  within  the  minor 
groove — concentrations  increase  toward  major 
groove  values.  In  fact,  in  the  second  ion  layer,  the 
minor  groove  concentration  is  larger  than  the  ma¬ 
jor  groove  value.  Isolation  of  the  individual  dielec¬ 
tric  decrement  effects  shows  that  the  electric-field 
component  plays  a  much  larger  role  in  the  minor 
groove  than  in  the  major  groove. 

From  Table  III,  the  average  solvent  dielectric 
constant  in  the  grooves  is  seen  to  be  essentially 
independent  of  added  salt  concentration.  With  the 
exception  of  the  minor  groove,  the  local  dielectric 
constant  of  all  other  regions  monotonically  in¬ 
creases  away  from  the  surface  toward  the  bulk 
value.  The  minor  groove,  however,  displays  a  min¬ 
imum  dielectric  constant  about  2  A  from  the  sur¬ 
face  due  to  the  presence  of  counterions  as  well  as 
the  high  electric-field  strength  in  the  region. 


1092 


VOL.  65,  NO.  6 


DNA  COUNTERION  DISTRIBUTIONS 


Whereas  potential  and  counterion  concentrations 
are  affected  mainly  in  the  layer  closest  to  the 
surface,  solvent  dielectric  constant  values  are  still 
10%  below  the  bulk  value  as  far  as  6  A  from  the 
surface.  The  significance  of  this  in  accurately  mod¬ 
eling  the  electrostatic  force  between  a  molecule 
(protein,  ligand)  and  DNA  or  between  an  electron 
donor-acceptor  pair  interacting  near  the  surface 
is  clear. 

However,  for  processes  further  from  the  surface 
or  for  those  where  an  average  over  the  entire 
environment  space  is  required,  the  lowering  of  the 
dielectric  constant  at  the  DNA  surface  may  not  be 
as  important.  Consider  the  charged  cylinder  cell 
model  calculation  of  our  previous  work  [16]  in 
which  50  mM  monovalent  salt  was  added  to  a 

o 

neutral  system  consisting  of  a  10  A  radius  cylinder 
with  the  same  surface  charge  density  as  DNA  in  a 
cell  of  100  A  radius.  The  first  passage  time  for  an 
ion  or  molecular  dipole  to  diffuse  from  the  outer 
cell  to  the  cylinder  surface  can  easily  be  calculated 
given  the  interaction  energy  (involving  the  PB 
electrostatic  potential,  the  ionic  charge  or  dipole 
moment,  and  the  spatially  varying  dielectric  con¬ 
stant)  as  a  function  of  the  radial  distance  from  the 
cylinder  [25].  Assuming  that  the  diffusion  constant 
for  the  ion  or  dipole  is  spatially  invariant,  the  first 
passage  time  with  a  variable  solvent  dielectric 
constant  for  mono-  or  divalent  charges  is  only 
about  2%  less  than  that  with  a  fixed  solvent  dielec¬ 
tric  constant  equal  to  78.5.  For  dipole  moments  of  2 
and  10  D,  the  deviations  are  only  1  and  3%  less, 
respectively.  These  results  are  not  totally  unex¬ 
pected  since  the  first  passage  time  is  not  strongly 
dependent  on  the  interaction  energy. 

The  results  presented  here  are  only  suggestive 
of  what  is  likely  to  be  observed  in  other  systems. 
Ion  size,  surface  geometry,  electrostatic  potential, 
counterion  concentration,  and  local  dielectric  con¬ 
stants  are  all  closely  coupled  so  that  only  broad 
generalities  may  be  drawn.  It  is  likely  that  certain 
cases  would  warrant  the  inclusion  of  local  dielec¬ 
tric  constants  above  others  with  the  particular 
thermodynamic  property  being  investigated  of 
prime  consideration.  In  those  cases,  the  effect  may 
be  considerable. 
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ABSTRACT:  In  earlier  investigations  N.  S.  Kim,  P.  R.  LeBreton,  J.  Am.  Chem.  Soc.  118, 
3694  (1996),  and  references  therein,  ultraviolet  (UV)  photoelectron  data  and  ab  initio 
molecular  orbital  calculations  yielded  information  about  nucleotide  electronic  structure 
by  providing  valence  ionization  potentials  (IPs)  of  nucleotide  bases  and  sugar  model 
compounds.  Here,  model  phosphate  group  ionization  potentials  have  been  evaluated  by 
employing  multireference,  single  and  double  excitation  configuration  interaction 
calculations  with  a  complete  active  space  self-consistent  field  (CASSCF)  wave  function. 
The  five  lowest  energy  IPs  of  and  the  four  lowest  energy  IPs  of  (CH3)HPO^  and 

(CH3)2PO^"  were  evaluated.  Calculations  were  performed  using  a  (12,9,  l)/[6,4, 1] 
double-zeta  basis  set  on  phosphorus  with  a  d-polarization  function;  (10,7)/[4,3]  and 
(10, 6)/[4, 2]  basis  sets  on  oxygen  and  carbon,  respectively;  and  a  (6)/[3]  basis  set  on 
hydrogen.  Two  types  of  CASSCF  calculations  were  carried  out.  In  one,  denoted  8e8a/7e8a, 
the  anions  and  radicals  had  8  and  7  electrons,  respectively,  in  8  active  orbitals.  In  the 
second,  denoted  10el0a/9el0a/  there  were  9  and  10  electrons  in  10  active  orbitals. 
Ionization  potentials  of  H2PO^,  (CH3)HPO^f,  and  (CH3)2PO^  were  also  obtained  from 
second-order  perturbation  theory  (CASPT2)  calculations  with  the  CASSCF  reference 
functions.  All  of  the  ionization  events  examined  are  associated  with  removal  of  electrons 
from  oxygen  atom  lone-pair  orbitals  on  the  closed-shell  anions.  For  H2PO^f  and 
(CH3)HPO^,  the  lowest  energy  CASPT  2  ionization  potentials  obtained  (4.29-4.36  eV  and 
4.12-4.27  eV,  respectively)  are  smaller  than  corresponding  IPs  (4.89  and  4.69  eV) 
previously  reported  from  results  of  6-31  +  G*  second-order  Moller-Plesset  (MP2) 
calculations.  For  the  second  through  fifth  IPs  of  H2P04-,  and  the  second  through  fourth 
IPs  of  (CH3)HP04_,  the  values  obtained  from  CASPT2  calculations  are  0.42  to  1.95  eV 
smaller  than  values  reported  from  the  combined  use  of  MP2  and  configuration  interaction 
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singles  calculations  with  the  6-31  +  G*  basis  set.  A  combination  of  results  from  MP2  and 
CASPT2  calculations  yields  values  of  4.89,  5.42,  6.30,  6.64,  and  7.41  eV  for  the  five  lowest 
IPs  of  H2PO^;  and  values  of  4.69,  5.43,  6.08,  and  6.55  eV  for  the  four  lowest  IPs  of 
(CH3)HP04~  ©  1997  lohn  Wiley  &  Sons,  Inc.  Int  J  Quant  Chem  65:  1095-1106,  1997 


Introduction 

Recent  quantum  mechanical  calculations  have 
provided  ionization  potentials  [1-4],  elec¬ 
tron  affinities  [3,  4],  and  vibration  [5,  6]  transition 
energies  in  nucleotide  components.  Recent  results 
from  quantum  calculations  have  also  described 
electronic  influences  on  tautomerism  [5-7],  hydro¬ 
gen  bonding  and  solvation  [8],  base  stacking  [9], 
proton  transfer  in  radical  anions  [10],  radiation- 
induced  modification  mechanisms  [11],  and  al¬ 
kylation  reactions  [12-16]  involving  nucleotides, 
nucleotide  components,  and  nucleotide  model 
compounds.  Essential  to  all  of  these  investigations 
are  reliable  and  accurate  descriptions  of  valence 
electrons. 

Because  nucleotides  contain  between  160  and 
180  electrons,  it  is  not  currently  possible  to  de¬ 
scribe  their  valence  structure  by  employing  highly 
rigorous  computational  methods.  To  date,  the  elec¬ 
tronic  structures  of  intact  nucleotides  have  only 
been  examined  at  the  self-consistent  field  (SCF) 
level  [9, 12, 13, 17-21].  The  results  of  these  calcula¬ 
tions  indicate  that  the  upper  occupied  7 7  and 
lone-pair  orbitals  are  largely  localized  on  either  the 
base,  sugar,  or  phosphate  groups  and  have  elec¬ 
tron  distributions  which  are  similar  to  those  occur¬ 
ring  in  the  separated  components.  While  it  is  not 
possible  to  easily  assess  the  reliability  of  SCF  de¬ 
scriptions  of  valence  electrons  in  intact  nu¬ 
cleotides,  it  is  possible  to  test  descriptions  of  nu¬ 
cleotide  components.  One  strategy,  employed 
by  two  of  the  contributors  to  this  investigation 
(S.  M.  F.  and  P.  R.  L.),  has  been  to  compare  calcu¬ 
lated  ionization  potentials  (IPs)  of  nucleotide  base 
and  sugar  model  compounds,  obtained  by  apply¬ 
ing  Koopmans'  theorem  [22]  to  SCF  results,  with 
experimental  IPs  obtained  from  He(I)  ultraviolet 
(UV)  photoelectron  (PE)  experiments  [1,  2,  12,  13, 
23,  24].  In  addition  to  testing  the  SCF  results  for 
nucleotide  components,  these  comparisons  of  theo¬ 
retical  and  experimental  IPs  have  provided  a 
means  by  which  valence  IPs  obtained  from  SCF 
calculations  on  intact  nucleotides  can  be  individu¬ 


ally  corrected  so  as  to  provide  more  reliable  values 
of  as  many  as  14  of  the  lowest  energy  IPs  in  the 
nucleotides  [9,  12,  13,  18-21]. 

While  PE  data  for  nucleotide  bases  and  sugar 
model  compounds  have  provided  an  experimental 
basis  for  assessing  theoretical  descriptions  of  the 
valence  electrons  in  the  neutral  base  and  sugar 
groups  of  nucleotides,  experimental  IPs  are  avail¬ 
able  for  only  a  few  oxygen  and  phosphorus  con¬ 
taining  anions.  It  is  necessary  to  obtain  an  under¬ 
standing  of  phosphate  esters  which  is  as  reliable  as 
the  current  understanding  of  bases  and  sugars  in 
order  to  develop  highly  accurate  descriptions  of 
the  electronic  properties  of  nucleotides.  However, 
to  our  knowledge,  no  experimental  IPs  have  yet 
been  reported  for  anionic  phosphate  esters. 

In  earlier  theoretical  investigations,  SCF  calcula¬ 
tions  on  phosphate  esters  were  employed  to  exam¬ 
ine  orbital  energies  [11,  25],  electrostatic  potentials 
[26],  and  conformational  properties  [27].  In  post- 
SCF  investigations  [12,  13],  second-order  (MP2) 
and  third-order  (MP3)  Moller-Plesset  perturbation 
calculations  with  the  6-31  +  G*  basis  set  [28] 
(MP2/  and  MP3/6-31  +  G*)  yielded  values  of 
4.89  and  4.87  eV,  respectively,  for  the  first  ioniza¬ 
tion  potential  of  H2P04“.  Here,  the  first  IP  was 
obtained  as  the  difference  between  the  energies  of 
the  ground-state,  closed-shell  anion  and  the 
ground-state  radical  formed  by  removal  of  an  elec¬ 
tron.  In  test  MP2/6-31  +  G*  calculations  on 
CH30”,  PO^T,  and  PO3,  and  the  corresponding 
ground-state  radicals,  theoretical  IPs  obtained  in 
this  manner  differed  from  experimental  IPs  by  less 
than  0.3  eV  [12].  In  an  extension  of  this  approach, 
excitation  energies  of  H2P04  obtained  from  6-31 
+  G*  calculations  using  the  configuration  interac¬ 
tion  singles  (CIS)  method  [29],  were  combined 
with  results  from  MP2  calculations  on  the 
ground-states  of  H2P04"  and  H2P04.  This  combi¬ 
nation  of  calculations  (denoted  MP2/CIS)  pro¬ 
vided  values  for  the  second  to  fifth  lowest  energy 
IPs  of  H2P04“.  The  same  approach  was  also  ap¬ 
plied  to  (CH3)HP04“  [18].  When  employed  to  cal¬ 
culate  the  first  five  IPs  of  N02,  the  MP2/CIS 
method  yielded  a  value  of  the  lowest  energy  IP 
which  differed  from  the  experimental  value  by  less 
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FIGURE  1 .  Choice  of  axis  for  H2P04  . 


than  0.1  eV.  However,  the  theoretical  values  of  the 
higher  energy  IPs  were  less  accurate.  In  one  case, 
the  experimental  IP  differed  from  the  theoretical  IP 
by  more  than  0.9  eV  [12]. 

The  goal  of  the  present  investigation  is  to  em¬ 
ploy  a  more  rigorous  computational  approach 
which  relies  on  the  multireference  Cl  method  with 
singles  and  doubles,  and  a  complete  active  space 
self-consistent  field  (CASSCF)  wave  function  [30]. 
Second-order  pertubation  theory  (CASPT2)  calcu¬ 
lations  with  the  CASSCF  wave  functions  [31,  32] 
have  also  been  employed  to  describe  the  five  low¬ 
est  energy  ionization  events  in  H2P04-,  and  the 
four  lowest  energy  ionization  events  ionization 
events  in  CH3HPO^  and  (CH3)2P04'. 


Methods 

The  symmetry  of  H2P04“  and  (CH3)2P04_  em¬ 
ployed  in  the  calculations  was  C2v ;  the  symmetry 
of  CH3HP04"  was  Cs.  The  geometry  for  H2P04“ 
has  been  obtained  previously  by  optimization  at 
the  6-31  +  G*  level  [12,  28].  The  choice  of  axis  for 
H2P04-  is  shown  in  Figure  1.  For  CH3HP04”, 
heavy-atom  bond  lengths  were  obtained  from  de- 
oxcytidylyl(3'-5')  deoxyguanosine  crystallographic 
data  [18,  33].  The  geometry  used  for  (CH3)2POT 
was  based  on  those  for  H2PO^  and  (CH3)HPO^. 
Bond  lengths  and  bond  angles  used  for  calcula¬ 
tions  on  the  phosphate  anions  are  given  in  Table  I. 
The  geometries  of  the  radicals  were  taken  to  be  the 


TABLE  I _ 

Bond  lengths  and  bond  angles  of  H2PO^, 
CH3HP04-,  and  (CH3)2P04-. 


Bond 

Length3 

Atoms 

Angleb 

P— 0 

1.63 

0  =  P  — 0 

109.74 

P  =  0 

1.48 

O— P— 0 

94.74 

O— H 

0.95 

P— O— H 

109.13 

O— C 

1.40 

P—O— C 

117.69 

C— H 

1.08 

O— C— H 

110.96 

a  In  angstroms 
b  In  degrees. 


same  as  those  of  the  anions,  and  the  calculated  IPs 
correspond  to  vertical  ionization  potentials. 

Polarization  and  diffuse  functions  are  needed  to 
obtain  reliable  results  on  phosphate  anions  [25,  34, 
35].  For  phosphorus  a  (12,9,1)  Gaussian  basis  set 
was  contracted  to  [6,  4,  1],  resulting  in  a  double- 
zeta  basis  set  [37]  augmented  with  one  d-polariza- 
tion  function,  with  an  orbital  exponent  [a d(P)]  of 
0.43  [38].  Values  of  the  3d  exponent,  found  in  the 
literature,  range  between  0.39  and  0.60  [25,  28,  36, 
38-41].  An  oxygen  (10,7)  basis  set,  contracted  to 
[4,  3],  was  derived  from  a  previously  reported 
(10, 6)  basis  set  [42]  by  adding  one  set  of  diffuse  p 
functions.  The  value  of  the  orbital  exponent  ( ap ) 
was  0.0564  and  was  chosen  according  to  the  even- 
temperted  criterion.  In  comparison,  the  value  of 
the  exponent  optimized  for  the  oxygen  anion  is 
0.0515.  For  carbon  and  hydrogen,  previously  re¬ 
ported  (10, 6)  and  (6)  basis  sets  [42,  43]  were  con¬ 
tracted  to  [4,  2]  and  [3],  respectively. 

Ionization  potentials  were  obtained  as  the  dif¬ 
ference  between  the  energies  calculated  for  the 
anion  and  the  radial  at  the  SCF,  CASSCF,  and 
CASPT2  levels.  For  H2P04-,  one  set  of  IPs  was 
obtained  from  a  CASSCF  calculation  on  H2PO^ 
(denoted  8e8  a)  in  which  there  were  eight  electrons 
in  eight  active  orbitals,  two  within  each  irreducible 
reprentation.  In  conjunction  with  the  8e8  a  calcula¬ 
tion  on  H2PO^,  a  CASSCF  calculation  (denoted 
7e8a)  was  also  carried  out  on  the  H2P04  radical. 
This  had  seven  electrons  in  eight  active  orbitals. 
The  H2P04"  and  H2P04  calculations  yielded  val¬ 
ues  (denoted  8e8a/7e8a)  for  the  four  lowest  ion¬ 
ization  potentials.  The  four  IPs  corresponded,  re¬ 
spectively,  to  the  removal  of  one  electron  from  the 
highest  occupied  molecular  orbital  of  each  irre¬ 
ducible  representation  (orbitals  11  alf  2 a2,  6 bv  and 
6 b2).  For  H2P04-,  a  second  set  of  ionization  poten- 
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tials  (denoted  10el0a/9el0a )  were  obtained  from 
lOelOa  calculations  on  H2P04_,  and  9el0 a  calcula¬ 
tions  on  H2P04.  Of  the  10  active  orbitals,  4  have  bx 
symmetry,  and  2  each  have  av  a2,  and  b2  symme¬ 
tries.  This  set  of  calculations  yielded  the  five  low¬ 
est  energy  IPs  of  H2P04~.  The  fifth  IP  corresponds 
to  removal  of  an  electron  from  the  5 b1  orbital.  For 
H2P04-,  the  first  and  fifth  IPs,  which  both  arise 
from  removal  of  a  b1  electron  were  obtained  via  a 
third  set  of  CASSCF  calculations.  In  this  case,  the 
ionization  potentials  (denoted  10el0a/av.9el0a) 
were  obtained  from  calculations  for  which  the  av¬ 
erage  energies  associated  with  the  two  1Bl  states 
of  H2P04  (a2B1  and  &2B1)  were  optimized. 

For  CH3HP04”,  ionization  potentials  were  also 
obtained  from  8e8a/7e8a  and  10el0a/9el0a  cal¬ 
culations.  However,  the  CASSCF  calculation  of  the 
lowest  roots  for  the  2 A  and  2 A"  states  of  CH3HP04 
yields  only  two  IPs,  corresponding  to  revmoal  of 
an  electron  from  the  20  a'  and  9  a"  orbitals,  respec¬ 
tively.  Attempts  to  perform  CASSCF  calculations 
with  the  optimization  carried  out  for  the  second 
root  corresponding  to  the  2 A  and  2 A  states  were 
unsuccessful.  In  order  to  obtain  IPs  associated  with 
the  two  lowest  energy  2 A  states  of  CH3HP04,  and 
the  two  lowest  energy  2 A  states,  we  performed 
CASSCF  calculations  optimized  for  the  average 
energy  of  the  two  lowest  states  of  each  symmetry. 
These  states,  for  which  the  average  energies  were 
optimized,  are  denoted  a2A  and  b2A ,  and  a2A 


and  b2A,  respectively.  Values  of  IPs,  based  on  the 
calculations  for  which  the  average  energies  were 
optimized,  are  denoted  8e8a/av.7e8a  and 
10el0a/av.9el0a.  For  (CH3)2PO^,  values  of  the 
four  lowest  energy  IPs  were  obtained  from  8e8a/ 
7e8a  calculations. 

Second-order  perturbation  theory  (CASPT2)  cal¬ 
culations  with  the  reference  CASSCF  8e8a/7e8a, 
8e8a/avHe8a,  10el0a/9el0a  and  10<?10a/av.9el0a 
wave  functions  have  also  been  employed  to  evalu¬ 
ate  selected  IPs  of  H2P04“,  CH3HP04~,  and 
(CH3)2P04“.  The  CASPT2  energy  values  reported 
correspond  to  the  nondiagonal  approach.  Finally, 
the  first  five  IPs  of  H2P04“  and  the  first  four  IPs  of 
CH3HP04~  have  been  obtained  by  combining  pre¬ 
viously  reported  [12,  18,  19]  values  for  the  lowest 
energy  IPs  obtained  from  MP2/6-31  +  G*  calcula¬ 
tions  with  results  from  CASPT2  calculations. 


Results  and  Discussion 

The  total  energies  of  all  species  are  listed  in 
Tables  II  to  IV,  and  the  corresponding  IPs  are 
reported  in  Table  V.  In  Table  V,  IPs  calculated  at 
the  CASSCF  level  are  0.28-0.81  eV  smaller  than 
IPs  calculated  at  the  SCF  level.  This  difference 
arises  because  the  closed-shell  anion  is  better  rep¬ 
resented  by  a  single  determinant  wave  function 
than  the  open-shell  radicals.  For  H2POT,  the  coef- 


TABLE  II 


Total  energies  of  H2P04  and  H 

2P04  in  a.u.a 

H2P04- 

h2po4 

\ 

a2B 2A2 

2b2 

% 

b% 

E(SCF) 

E(CASSCF)b 

E(CASSCF)C 

E(CASSCF)d 

E(CASPT2)e 

E(CASPT2)9 

E(CASPT2)h 

-0.477068 

-0.573979 

-0.592194 

-0.1 4981 8f 
—  0.151389f 

-0.305359  -0.277967 

-0.418271  -0.402465 

-0.438787  -0.418125 

-0.430378 

-0.989762  -0.971146 

-0.992279  -0.972808 

-0.993831 

-0.237350 

-0.359464 

-0.375717 

-0.939456 

-0.940458 

-0.227009 

-0.342319 

-0.371143 

-0.925549 

-0.927958 

-0.322455 

-0.286281 

-0.899667 

-0.914097 

a  Relative  to  -641.000000  a.u.  unless  otherwise  indicated. 
b  CASSCF  8e8a  or  7e8a. 
c CASSCF  lOelOa  or  9e10a. 

d  CASSCF  9e10a  optimized  for  the  average  energy  of  the  a2B1  and  b2B^ 
6  With  the  CASSCF  8e8a  or  7e8a  reference  wave  functions. 
f  Relative  to  -642.000000  a.u. 

9  With  the  CASSCF  lOelOa  or  9e10a  reference  wave  functions. 
h  With  the  CASSCF  av.  9e10a  reference  wave  function. 

states  of  H2P04 

(denoted  CASSCF  av.  9e10a). 
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TABLE  III _ 

Total  energies  of  CH3HP04  and  CH3HP04  in  a.u.a 


CH3HP04-  _ ch3hpo4 


a2 A' 

b2A' 

a2 A 

b2A 

E(SCF) 

-0.482995 

-0.314723 

-0.248211 

E(CASSCF)b 

-0.579359 

-0.428128 

-0.368744 

E(CASSCF)0 

-0.426110 

-  0.402247 

-0.357785 

-0.350327 

E(CASSCF)d 

-0.599325 

-0.450873 

-0.394395 

E(CASSCF)e 

-0.449348 

-0.422732 

-0.380919 

-0.372202 

E(CASPT2)f 

-0.249707 

-0.092824 

-0.042585 

E(CASPT2)9 

-0.092638 

-0.041766 

-0.028106 

E(CASPT2)h 

-0.250381 

-0.093604 

-0.046247 

E(CASPT2)' 

-0.098937 

-0.071913 

-0.030707 

a  SCF  and  CASSCF  energies  are  relative  to  -680.000000  a.u.  CASPT2  energies  are  relative  to  -681 .000000  au. 
bCASSCF  8e8a  or  7e8a. 

c  CASSCF  7e8a  optimized  for  the  average  energy  of  the  two  a2A  and  b2A  states,  and  two  a2A"  and  b2A'  states  (denoted 
CASSCF  av.7e8a). 

“CASSCF  lOelOa  or9e10a. 

e  CASSCF  9e10a  optimized  for  the  average  energy  of  the  two  a2A'  and  b2A  states,  and  the  two  a2A"  and  b2A'  states  (denoted 
CASSCF  av.9e10a). 

f  With  the  CASSCF  8e8a  or  7e8a  refrence  wave  functions. 

9  With  the  CASSCF  av.7e8a  reference  wave  function. 
h  With  the  CASSCF  lOelOa  or  9e10a  reference  wave  functions. 

1  With  the  CASSCF  av.9e10a  reference  wave  function. 


ficient  of  the  leading  configuration  in  the  CASSCF 
wave  function  is  0.975.  For  the  ground-state  radi¬ 
cals,  the  coefficients  of  the  leading  configuration 
are  in  the  range  0.95-0.96. 

The  results  in  Table  V  demonstrate  that  ionzia- 
tion  potentials  from  a  CASSCF  calculation  opti¬ 
mized  for  the  average  energy  of  two  states  are  not 
accurate.  When  calculations  were  carried  out  with 
a  9el0  a  reference  wave  function,  optimized  for  the 
average  energy  of  the  a2B1  and  b2B1  states  of 
H2P04,  the  CASSCF  10el0a/av.9el0a  values  for 
the  a2Bj  and  b2B1  ionization  potentials  were  4.40 
and  8.32  eV.  When  the  calculation  was  optimized 


for  the  a2Bj  and  b2B1  states  separately,  the  corre¬ 
sponding  CASSCF  10el0a/9el0a  ionization  poten¬ 
tials  were  4.17  and  7.34  eV.  In  contrast,  the  IPs 
obtained  from  CASPT2  calculations  employing  an 
averaged  CASSCF  9el0a  reference  wave  function 
to  describe  H2P04  (CASPT2  10el0a/av.9el0fl)  are 
in  closer  agreement  with  IPs  obtained  from 
CASPT2  calculations  with  separately  optimized 
wave  functions  (CASPT2  10el0a/9el0a).  The 
CASPT2  10el0fl/av.9el0a  values  for  the  a2B1  and 
b2  B1  ionization  potentials  of  1 1 2  P04~  are  4.29  and 
6.46  eV,  respectively;  the  CASPT2  10el0a/9el0a 
values  are  4.33  and  6.85  eV.  Similar  trends  have 


TABLE  IV _ 

Total  energies  of  (CH3)2P04  and  (CH3)2P04  in  a.u.a 


(ch3)2po4- 

1A1 

(CH3)2P04 

% 

2a2 

2b2 

2*i 

E(SCF) 

-0.488963 

-0.324054 

-0.294497 

-0.257729 

-0.248975 

E(CASSCF)b 

-0.585302 

-0.435846 

-0.418264 

-0.377721 

-0.361671 

E(CASPT2)° 

-0.349900 

-0.195698 

-0.173287 

-0.147411 

-0.134618 

a  SCF  and  CASSCF  energies  relative  to  -719.000000  a.u.  CASPT2  energies  relative  to  -720.000000  a.u. 
b  CASSCF  8e8a  or  7e8a. 

c  With  the  CASSCF  8e8a  or  7e8a  reference  wave  functions. 
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TABLE  V _ _ 

Calculated  ionization  potentials  (in  eV)  of  H2P04,  CH3HP04,  and  (CH3)2P04  . 


Anions 

H2P04- 

CH3HP04- 

(ch3; 

I2P04- 

Radical  States 

a2B, 

CM 

cS* 

2b2 

'% 

b2B \ 

a2 A' 

b2A'  a2 A 

b2A 

2b, 

2a2 

2b2 

SCF 

4.68 

5.41 

6.53 

6.80 

4.57 

6.39 

4.49 

5.28 

6.29 

6.50 

CASSCF3 

4.22 

4.67 

5.84 

6.30 

4.12 

5.74 

4.07 

4.54 

5.65 

6.08 

CASSCFb 

4.17 

4.74 

5.89 

6.01 

7.34 

4.04 

5.58 

CASSCF0 

4.18 

4.82  6.03 

6.24 

CASSCFd 

4.40 

8.32 

4.08 

4.80  5.94 

6.18 

CASPT2e 

4.36 

4.86 

5.72 

6.10 

4.27 

5.63 

4.20 

4.81 

5.51 

5.86 

CASPT21 

4.33 

4.86 

5.74 

6.08 

6.85 

4.27 

5.55 

CASPT29 

4.27 

5.66 

6.03 

CASPT2h 

4.29 

6.46 

4.12 

4.86 

5.98 

aCASSCF  8e8a/7e8a. 
b  CASSCF  10e10a/9e10a. 
c  CASSCF  8e8a  /  av.7e8a. 


d  CASSCF  10e10a/av.9e10a. 
e  CASPT2  8e8a/7e8 a. 
f  CASPT2  10e10a/9e10a. 
9CASPT2  8e8a/av.7e8a. 
h  CASPT2  10e10a/av.9e10a. 


been  observed  in  other  calculations  of  exicted  states 
[44], 

The  most  rigorously  calculated  first  ionization 
potentials  of  H^PO,^  and  CM  3 I  IPO,  were  ob¬ 
tained  from  CASPT2  10 Ml Otf /9e\0n  calculations 
and  have  values  of  4.33  and  4.27  eV,  respectively. 
A  comparison  of  CASPT2  results  for  H2P04~  and 
CH3HP04_  indicates  that  for  each  anion  the  first 
IPs  obtained  with  the  9el0a  and  av.9el0a  H2P04 
reference  wave  functions  differ  by  no  more  than 
0.15  eV.  For  the  higher  energy  IPs,  larger  differ¬ 
ences  sometimes  occur.  For  example,  for  the  fifth 
IP  of  H2P04“,  and  the  second  IP  of  CH3HP04,  the 
differences  between  the  CASPT2  values  obtained 
with  the  9el0a  and  the  av.9el0«  reference  wave 
functions  are  0.39  and  0.43  eV,  respectively. 

The  CASPT2  10el0a/9el0a  values  of  the  first 
four  IPs  of  H2P04",  and  of  the  first  and  third  IPs  of 
CH3HP04"  differ  from  the  CASPT2  8e8a/7e8a 
values  by  no  more  than  0.08  eV.  The  CASPT2 
10el0a/9el0a  values  of  the  fist  IPs  of  H2P04“  and 
Cl  I  CHPOf  (4.33  and  4.27  eV)  are  0.56  and  0.42  eV 
smaller  than  IPs  obtained  from  MP2/6-31  +  G* 
calculations  [12,  18,  19].  In  test  MP2/6-31  +  G* 
calculations  on  CH30~,  P02 ,  and  P03“,  theoretical 
first  IPs  differed  from  experimental  values  by  less 
than  0.3  eV  [12].  This  observation  suggests  that  the 


first  IPs  predicted  by  the  MP2/6-31  +  G*  calcula¬ 
tions  are  more  accurate  than  the  first  IPs  obtained 
from  the  CASPT2  calculations.  Of  course,  the 
MP2/6-31  +  G*  calculations  provide  no  reliable 
information  about  higher  energy  ionization  poten¬ 
tials. 

With  the  6-31  +  G*  basis  set,  earlier  investiga¬ 
tions  [12, 18]  employing  a  combination  of  MP2  and 
CIS  methods  (denoted  MP2/CIS)  provided  values 
for  the  second  through  fifth  IPs  of  H2P04“,  associ¬ 
ated  with  the  2A2,  2B2,  2Av  and  F2B1  states  of 
H2P04.  Similarly,  MP2/CIS  calculations  [18,  19] 
were  carried  out  to  obtain  the  second  through  fifth 
IPs  of  CH3HP04_,  which  give  rise  to  the  a2 A", 
b2A",  a2A',  and  b2A'  states  of  CH3HP04.  The  CIS 
excitation  energies  for  H2P04  are,  in  some  cases, 
significantly  different  from  CASPT2  and  CASSCF 
exciation  energies.  For  H2P04,  the  first  (2A2),  sec¬ 
ond  (2B2)  and  fourth  (b2Bt)  excitation  energies, 
and  for  CH3HP04,  the  first  ( b2A ")  and  second 
(a2A')  excitation  energies  obtained  from  all  of  the 
CASPT2  calculations  are  0.47  to  1.74  eV  smaller 
than  corresponding  excitation  energies  obtained 
from  CIS  calculations.  In  cases  where  comparisons 
were  made,  results  from  CASSCF  calculations  were 
similar.  Here,  the  CASSCF  7e8a  and  9<?10 a  excita¬ 
tion  energies  for  the  2A2,  2B2,  and  b2B1  states  of 
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H2P04,  and  for  the  a2 A  state  of  CH3HPO4  are 
between  0.52  and  0.87  eV  smaller  than  the  CIS 
energies. 

The  CASPT2  and  CASSCF  values  for  the  third 
excitation  energies  associated  with  the  ZA1  and 
b2A  states  of  H2P04  and  CH3HPO4,  respectively, 
are  larger  than  values  obtained  from  CIS  calcula¬ 
tions.  Results  from  CASPT2  7e8a  and  9el0  a  calcu¬ 
lations  on  H2PC>4  and  av.7e8a  and  av.  9el0  a  calcu¬ 
lations  on  CH3HPO4  yield  2Al  and  b2A  excitation 
energies  that  are  0.27-0.40  eV  larger  than  the  CIS 
energies.  Similar  results  were  obtained  from 
CASSCF  7e8a  and  9el0 a  calculations  on  H2P04, 
which  predict  that  the  2AX  excitation  energy  is 
0.37-0.61  eV  larger  than  that  obtained  from  CIS 
calculations. 

In  addition  to  differences  in  the  absolute  values 
of  H2P04"  and  CH3HPO^  ionization  potentials, 
the  differences  in  the  CASPT2  and  CASSCF  versus 
CIS  exciation  energies  lead  to  differences  in  the 
energetic  ordering  of  IPs.  For  H2PO^,  results  from 
CASPT  10el0a/9el0a  and  8e8a/7e8a  calculations 
indicate  that  the  2B2  ionization  potential  is  smaller 
than  that  of  2AX  by  0.34  and  0.38  eV,  respectively. 
The  same  ordering  is  predicted  by  CASSCF 
8e8a/7e8a  and  10el0a/9el0a  calculations.  In  con- 
strast,  the  MP2/CIS  calculations  predict  that  the 
2AX  ionization  potential  (6.36  eV)  is  0.88  eV  smaller 
than  2B2  [12,  18]. 

For  CH3HP04_,  differences  between  the  CASPT2 
and  MP2/CIS  results  are  similar  to  those  occur¬ 
ring  for  H2P04".  CASPT2  8e8a/av.7e8a  calcula¬ 
tions  predict  the  a2 A  ionization  potential  (5.66  eV) 
is  0.47  eV  smaller  than  that  of  b2A.  Here,  the 
MP2/CIS  calculations  predict  that  the  b2A  ioniza¬ 
tion  potentials  (6.20  eV)  is  0.63  eV  smaller  than  the 
a2A  ionization  potential.  Like  H2P04~,  the  second 
through  fourth  IPs  obtained  from  the  CASPT2  cal¬ 
culations  are  smaller  than  those  obtained  from 
MP2/CIS  calculations.  The  CASPT2  10el0a/9el0a 
value  for  the  third  IP,  and  the  CASPT2  10el0a/ 
av.9el0  a  values  for  the  second  and  fourth  IPs  are 
smaller  than  the  MP2/CIS  values  by  1.28,  1.04, 
and  0.22  eV,  respectively. 

For  (CH3)2P04_,  the  ordering  of  the  first  four 
IPs  obtained  from  SCF,  and  from  8e8a/7e8a 
CASSCF  and  CASPT2  calculations  is  the  same  as 
that  predicted  for  H2P04_.  Like  results  obtained 
from  H2PO^,  the  CASSCF  and  CASPT2  ionization 
potentials  for  (CH3)2P04~  are  smaller  than  the  SCF 
ionization  potentials.  The  differences  are  between 
0.29  and  0.78  eV.  Also  like  H2P04~,  the  difference 
between  corresponding  (CH3)2PO^  ionization  po¬ 


tentials  obtained  from  CASSCF  and  CASPT2 
8e8a/7e8a  calculations  is  smaller  than  the  differ¬ 
ences  between  either  the  CASSCF  or  CASPT2  val¬ 
ues,  and  the  SCF  values.  The  differences  between 
IPs  obtained  from  the  CASSCF  and  CASPT2  calcu¬ 
lations,  with  the  8e8a/7e8a  wave  functions,  are  in 
the  ranges  of  0.12-0.20  eV  and  0.13-0.37  eV,  for 
H2P04"  and  (CH3)2P04~,  respectively. 

Figures  2-6  show  electron  density  difference 
maps  (electron  holes)  associated  with  the  five  low¬ 
est  IPs  of  H2P04”.  The  electron  holes  were  ob¬ 
tained  by  subtracting  the  electron  density  of  the 
radical  calculated  at  the  CASSCF  9e/10 a  level 
from  the  electron  density  of  the  anion  calculated  at 
the  CASSCF  10e/10 a  level.  The  figures  correspond 
to  cross  sections  through  two  planes.  One  contains 
the  P  atom  and  the  negatively  charged  O  atoms 
(Ox  and  02).  The  second  plane  passes  through  Ox 
and  02  and  is  perpendicular  to  the  first  plane.  The 
electron  holes  associated  with  creation  of  the  five 
lowest  energy  states  of  the  H2P04  radical  (a2BX/ 
2A2/  2B2,  2Ax,  and  b2Bx )  are  similar  to  those  pre¬ 
dicted  from  earlier  Koopmans'  analysis  of  results 
from  SCF  calculations  on  H2P04_  with  the  6-31G 
basis  set,  and  to  MP2/6-31  +  G*  and  MP2/CIS 
results  on  H2P04"  and  H2P04  [12]. 

Besides  descriptions  of  the  loss  in  electron  den¬ 
sity  which  occurs  in  the  transitions  from  H^C^ 
to  H2P04,  the  CASSCF  electron  density  difference 
maps  of  Figures  2-6  show  regions  where  electron 
density  increases.  For  example,  in  the  electron  dis¬ 
tribution  associated  with  the  a2Bx  ionization  po¬ 
tential  of  H2PO^  (Fig.  2),  the  electron  density 
increases  in  the  region  around  phosphorus  and  in 
some  regions  where  there  is  large  contribution 
from  2  p-bonding  and  2  p-lone-pair  orbitals  on  the 
negatively  charged  Ox  and  02  atoms.  For  the  2A2 
ionization  potential  (Fig.  3),  the  electron  density 
increases  in  orbitals  contributing  to  C^PC^  and 
OxOz  bonding  interactions.  Similar  regions  of  in¬ 
creased  electron  density  are  observed  in  electron 
holes  associated  with  the  2B2,  2AX,  and  b2Bx  ion¬ 
ization  potentials. 

An  earlier  investigation  [12]  provided  evidence 
that  MP2  calculations  with  the  6-31  +  G*  basis  set 
provide  generally  good  predictions  of  the  first  ion¬ 
ization  potentials  of  phosphorus  and  oxygen  con¬ 
taining  anions.  With  this  in  mind,  it  is  likely  that  a 
combination  of  earlier  reported  MP2  results  and 
results  from  CASPT2  calculations  provide  reason¬ 
ably  accurate  values  for  the  second  and  higher 
ionization  potentials.  Table  VI  lists  values  of  the 
lowest  energy  four  and  five  IPs  obtained  from 
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FIGURE  2.  Electron  hole  associated  with  the  state  a2S1 
of  H2PO;:  (a)  through  the  plane  P0102;  (b)  through  the 
plane  parallel  to  y,  0,  z  and  passing  through  0-,  and  02. 
Dashed  lines  indicate  regions  in  which  charge  density 
was  removed  on  going  to  the  radical,  and  solid  lines 
indicate  regions  in  which  there  is  increased  charge 
density  in  the  radical. 


MP2/6-31  +  G*  and  CASPT2  calculations  for 
CH3HP04~  and  H2P04“,  respectively.  Here,  the 
first  IP  is  taken  from  earlier  MP2/6-31  4-  G*  re¬ 
sults  [12, 18, 19]  and  the  radical  excitation  energies 
are  taken  from  the  CASPT2  results.  In  this  case. 


FIGURE  3.  Electron  hole  assoicated  with  the  state  2A2 
of  H2PO; :  (a)  through  the  plane  P0102;  (b)  through  the 
plane  parallel  to  y,  0,  z  and  passing  through  01  and  02. 

the  values  for  the  second  ionization  potential  of 
H2P04”  (}A^),  and  the  corresponding  b2A'  ioniza¬ 
tion  potential  of  CH3HP04~  differ  from  the  earlier 
reported  MP2/CIS  ionization  potentials  by  no 
more  than  0.35  eV.  However,  in  Table  VI,  the 
values  for  the  third,  fourth,  and  fifth  ionization 
potentials  of  H2P04"  (2A2, 2B2,  and  b2A^)t  and  the 
third  and  forth  ionization  potentials  of  CH3HP04~ 
(b2A”  and  a2 A)  are  0.49  to  1.31  eV  smaller  than  the 
MP2/CIS  values.  For  the  first  IPs  of  H2P04"  and 
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FIGURE  4.  Electron  hole  associated  with  the  state  2B2 
of  H2PO;:  (a)  through  the  plane  P0102;  (b)  through  the 
plane  parallel  to  y,  0,  z  and  passing  through  01  and  02. 


FIGURE  5.  Electron  hole  associated  with  the  state  2A 1 
of  H2PO^:  (a)  through  the  plane  P0102;  (b)  through  the 
plane  parallel  by  y,  0,  z  and  passing  through  01  and  02. 


CH3HP04~,  the  uncertainty  of  ±0.3  eV  given  in 
Table  VI  is  based  on  a  comparison  of  MP2 /6-31  + 
G*  results  with  experimental  IPs  for  CH30~,  P02 
and  P03 .  The  uncertainty  associated  with  the  sec¬ 
ond,  third,  and  fourth  IPs  of  H2P04~  and 
CH3HPO^  (±0.5  eV)  reflects  the  uncertainty  in 
the  MP2/6-31  ±  G*  values  for  the  first  IPs,  and 


the  variation  in  values  of  the  radical  excitation 
energies  obtained  from  both  CASSCF  and  CASPT2 
calculations.  Similarly,  the  larger  uncertainty 
(±0.8eV)  assigned  to  the  fifth  IP  of  H2P04“  is 
based  on  the  wider  variation  in  CASPT2  and 
CASSCF  results  assoicated  with  the  excitation  en¬ 
ergy  for  the  b2Bx  state  of  H2P04. 
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Conclusions 

1.  The  lowest  energy  vertical  ionization  poten¬ 
tials  of  H2P04_  and  CH3HPO4-  predicted 
from  CASPT2  10el0fl/9el0fl  calculations 
are  4.33  and  4.27  eV,  respectively.  For 
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FIGURE  6.  Electron  hole  associated  with  the  state  52S1 
of  H2P04 :  (a)  through  the  plane  P0102;  (b)  through  the 
plane  parallel  to  y,  0,  z  and  passing  through  01  and  02. 


(CH3)2P04 ,  the  lowest  energy  IP  obtained 
from  CASPT2  &e&a/7e$a  calculations  is  4.20 
eV.  The  CASPT2  10el0a/9el0a  values  of  the 
first  ionization  potentials  of  H2PO^  and 
CH3FIP04_  are  0.56  and  0.42  eV  smaller  than 
values,  which  were  obtained  from  previously 
reported  MP2  calculations  with  the  6-31  +  G* 
basis  set.  An  earlier  comparison  of  MP2/6-31 
+  G*  ionization  potentials  of  phosphorus 
and  oxygen  containing  anions  [12]  suggests 
that  the  MP2/6-31  +  G*  values  for  the  first 
IPs  are  more  accurate. 

2.  Significant  differences  in  the  CASSCF  and 
CASPT2  excitation  energies  for  H2P04  and 
CH3HP04  versus  earlier  reported  energies 
obtained  from  CIS  calculations  with  a  6-31  + 
G*  basis  set  lead  to  a  different  ordering  of 
H2P04”  and  CH3HP04“  ionization  poten¬ 
tials.  Results  from  the  MP2/CIS  calculations 
indicate  that,  for  H2P04-  (CH3HPO^),  the 
2B2(fl2A')  ionization  potential  is  greater  than 
the  2A1  ( b2A' )  ionization  potential.  Results 
from  the  CASSCF  and  CASPT2  calculations 
predict  that  these  orderings  are  reversed. 

3.  While  the  CASPT2  ionization  potentials  of 
H2P04~  are  smaller  than  earlier  reported  IPs 
obtained  from  MP2/6-31  +  G*  andMP2/CIS 
calculations,  and  the  ordering  of  IPs  is  differ¬ 
ent,  the  descriptions,  which  the  current  and 
the  earlier  results  provide  of  electron  holes 
with  corresponding  symmetries  are  qualiti- 
atvely  similar  to  one  another.  These  descrip¬ 
tions  are  also  similar  to  those  obtained  by 
applying  Koopmans'  theorem  to  results  from 
split-valence  basis  set  SCF  calculations  on  the 
ground-state  anions. 

4.  Currently,  the  most  accurate  values  of  the 
four  and  five  lowest  energy  IPs  of  CH3HP04' 
and  H2P04“,  respectively,  are  believed  to 
arise  from  a  combination  of  results  from 
MP2/6-31  +  G*  calculations  of  the  first  IPs 
with  results  from  CASPT2  calculations  of 
CFI3HP04  and  FI2P04  excitation  energies. 
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TABLE  VI _ _ 

Ionization  potentials  of  H2P(V  and  CH3HP04“  obtained  by  combining  MP2/6-31  +  G*  and  CASPT2  results. 


Ionization  potentials3 

'Pi 

ip2 

ip3 

ip4 

ip5 

H2P04-b 

ch3hpo4-c 

4.89  ±  0.3 

4.69  ±  0.3 

5.42  ±  0.5 

5.43  ±  0.5 

6.30  ±  0.5 

6.08  ±  0.5 

6.64  ±  0.5 

6.55  ±  0.5 

7.41  ±  0.8 

aln  electron  volts.  MP2/6-31  +  G*  results  obtained  from  Refs.  [12],  [18],  and  [19].  The  estimated  uncertainty  in  the  reported 
values  is  discussed  in  the  text. 


b  The  first  five  IPs  associated  with  formation  of  the  a2Bu  2A2,  2B2,  %,  and  b2B1  radical  states,  respectively.  Obtained  from 
CASPT2  10e10a/9e10a  results. 

c  Ionization  potentials  associated  with  formation  of  the  a2A",  b2A",  and  a2A'  and  b2A'  radical  states.  Ionization  potentials  for  the 
b2A"  and  b2Af  states  obtained  from  CASPT2  lOelOa /av.9e10a  results.  Ionization  potentials  for  the  a2A ’  state  obtained  from 
CASPT2  8e8a/av.7e8a  results. 


sity  of  Illinois  at  Chicago,  the  Cornell  Theory  Cen¬ 
ter,  and  the  National  Center  for  Supercomputing 
Applications,  at  the  University  of  Illinois  at  Ur- 
bana-Champaign. 
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ABSTRACT:  A  conformational  and  electronic  semiempirical  quantum-chemical  study 
of  several  N-substituted  valpromides  is  presented,  followed  by  a  similarity  analysis  that 
takes  into  account  the  flexibility  of  the  molecules.  Rigid  analogs  are  included  in  the 
comparison  in  order  to  help  identify  the  anticonvulsant  active  conformations.  On  the 
basis  of  a  superposition  analysis,  which  includes  both  active  and  nonactive  structures, 
and  uses  the  global  minimum-energy  conformation  of  phenytoin  as  a  template,  the 
pharmacophoric  pattern  of  N-substituted  valpromides  is  defined.  It  is  related  to  the 
antiperiplanar  orientation  of  the  amide  function  relative  to  the  hydrocarbon  chain. 

©  1997  John  Wiley  &  Sone,  Inc.  Int  J  Quant  Chem  65:  1107-1114,  1997 

Key  words:  Valpromide  derivatives;  anticonvulsant  activity;  similarity  analysis; 
pharmacophore 


Introduction 

Since  the  introduction  of  carbamazepine  and 
valproate  into  clinical  practice  during  the 
1970s,  the  clinical  science  of  epilepsy  has  pro¬ 
gressed  considerably.  Nowadays,  epileptic  seizures 
are  most  commonly  treated  with  one  of  four 
antiepileptic  drugs  (AED):  carbamazepine  (cbz. 
Correspondence  to:  G.  L.  Estiu. 

Contract  grant  sponsors:  CONICET,  Universidad  Nacional 
de  La  Plata,  Cooperativa  Farmaceutica  de  Quilmes  (COFAR- 
QUIL),  Laboratories  Bago,  and  Colegio  de  Farmaceuticos  de  la 
Provincia  de  Buenos  Aires. 


31%),  phenytoin  (phen,  31%),  valproate  (vpa,  26%), 
and  phenobarbital  (pb,  18%)  [1-3].  New  promising 
compounds,  such  as  felbamate  [4,  5],  gabapentin, 
remacemide,  and  tiagabine  [6],  among  others,  have 
been  also  recently  marketed. 

The  optimal  goal  for  epilepsy  treatment  is  the 
complete  control  of  seizure  with  no  adverse  ef¬ 
fects.  However,  the  achievement  of  a  balance  be¬ 
tween  seizure  control  and  adverse  effects,  which 
include  neurotoxicity,  hepatotoxicity,  and  even  ter¬ 
atogenicity  [1,  2,  6],  still  remains  a  challenge  for 
the  most  expert  scientist.  The  success  in  develop¬ 
ing  more  effective  AEDs  in  the  future  will  depend, 
in  large  part,  on  our  ability  to  broaden  our  under- 
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standing  of  the  molecular  mechanisms  of  epilepto- 
genesis.  Molecular  similarity  studies,  either  based 
on  the  superimposition  of  nuclei,  the  comparison 
of  electron  density  matrices,  or  on  any  other  prin¬ 
ciple  [7],  have  been  extremely  useful  for  this  pur¬ 
pose,  as  they  allow  one  to  identify  the  structural 
and  electronic  requirements  associated  with  a  par¬ 
ticular  activity.  The  overall  picture  is,  however, 
complicated  by  the  different  pharmacokinetic 
(metabolic)  paths  that  similar  structures  might  un¬ 
dergo  in  the  biological  media.  A  typical  example 
of  this  complicated  picture  is  given  by  vpa  deriva¬ 
tives.  Vpa  is  recognized  as  a  first-line  AED,  be¬ 
cause  of  its  broad  spectrum  of  activity  and  its 
lower  neurotoxicity  [1,  8].  The  primary  amide  of 
vpa,  valpromide  (vpd)  has  been  found  to  be  2-5 
times  more  potent  than  vpa  in  mice  [9,  10].  Phar¬ 
macokinetic  studies  seem  to  indicate,  on  the  other 
hand,  that  it  is  biotransformed  to  the  acid,  to  the 
extent  of  30-40%  in  dogs  [11,  12],  and  80%  in 
humans  [13]. 

In  spite  of  the  complication  related  to  their 
different  metabolic  behavior  in  the  different 
species,  the  amides  represent  an  interesting  alter¬ 
native  because  they  are  more  active  as  anticonvul¬ 
sants,  less  teratogenic  and  probably  less  hepato- 
toxic  than  the  respective  acids  [14].  Several  analogs 
of  vpd  have  been  studied  in  their  pharmacokinet¬ 
ics  and  for  some  of  them  (e.g.,  valnoctamide)  the 
anticonvulsant  activity  has  been  determined.  From 
the  results  derived  from  those  studies,  it  has  been 
inferred  that  the  amides  that  do  not  undergo  bio¬ 
transformation  to  the  acid  are  the  most  active  ones 
[14,  15].  This  statement  implies  that  the  pharma¬ 
cokinetics  is  the  limiting  factor  for  the  anticonvul¬ 
sant  activity. 

In  the  systematic  analysis  that  has  led  to  that 
conclusion,  only  two  N-substituted  derivatives 
(N-methyl  tetramethylcyclopropane  carboxamide 
and  tetramethyl  cyclopropylcarbonylglycinamide) 
have  been  included  [15].  However,  to  our  knowl¬ 
edge,  neither  the  pharmacokinetics  nor  the  anti¬ 
convulsant  activity  of  N-substituted  valpromides 
have  been  studied  in  a  systematic  way,  oriented  to 
discern  the  influence  of  N-substitution  on  the  AE 
activity.  We  have  focused  on  this  group  of 
molecules  and,  in  addition  to  the  synthesis  and 
biological  testing  of  the  AE  activity,  we  have  per¬ 
formed  a  similarity  analysis,  based  on  the  super¬ 
position  of  conformers  followed  by  the  comparison 
of  their  charges,  calculated  by  means  of  quantum- 
chemical  methodologies.  This  analysis  was  ori¬ 


ented  to  assess  the  structural  and  electronic  re¬ 
quirements  associated  with  the  activity. 

This  article  mainly  concerns  the  details  of  the 
similarity  analysis  for  a  series  of  N-substituted 
valpromides,  which  have  been,  as  mentioned  be¬ 
fore,  synthesized  and  biologically  tested  in  our  lab. 
The  comparative  study,  which  includes  both  active 
and  inactive  structures,  as  well  as  rigid  analogs 
selected  to  confirm  the  inferences  derived  from  the 
conformational  analysis,  can  be  defined  as  a  struc¬ 
ture  activity  relationship  (SAR)  analysis  and  has 
allowed  us  to  identify  the  pharmacophoric  pattern 
for  the  AE  active  N-substituted  valpromides. 


Methodologies:  Calculation  Procedure 

Six  N-substituted  valpromides  (N-butyl-vpd, 
buvpd;  N,N-diethyl-vpd,  dievpd;  N-cyclohexyl-vpd, 
chvpd ;  N-isopropyl-vpd-,  ipvpd;  N-(4-carboxyphe- 
nyl)-vpd,  cpvpd ;  and  N-morpholin-vpa,  mvpd) 
have  been  synthesized  and  biologically  tested  for 
their  AE  activity.  In  order  to  gain  insight  in  the 
microscopic  origin  of  the  manifested  activity,  a 
molecular  similarity  analysis  was  performed  after 
a  thorough  quantum-chemical  conformational 
study.  Similarity  was  based  on  the  comparison  of 
both  geometric  and  electronic  descriptors,  defined, 
respectively,  by  the  position  of  the  nuclei  and  the 
charges  on  the  atomic  centers,  taking  into  account 
that  the  receptor  site  perceives  charge  distributions 
of  the  approaching  molecule  [16].  Rigid  analogs, 
rigidified  by  cyclation,  have  been  included  in  the 
comparison,  in  order  to  help  discern  the  conforma¬ 
tion  of  the  flexible  molecules. 

Details  of  the  synthesis  and  pharmacological 
studies  are  given  in  Ref.  [17].  In  a  general  scheme, 
the  synthesis  can  be  described  as  an  SN1  reaction 
of  valproyl  chloride  with  N-butyl,  N,N-diethyl, 
N-cyclohexyl,  N-isopropyl,  N-(4-carboxyphenyl)- 
amide  and  morpholine,  respectively. 

The  pharmacologic  tests  were  performed  ac¬ 
cording  to  standard  procedures  provided  by  the 
Antiepileptic  Drug  Development  Program  of  the 
National  Institute  of  Neurological  and  Commu¬ 
nicative  Disorders  and  Stroke  (NINCDS)  [18,  19]. 
Maximal  electroshock  seizure  (MES)  and  pentylen- 
tetrazol  seizure  threshold  (PTS)  tests  were  em¬ 
ployed  to  determine  the  anticonvulsant  activity. 
Rotorod  test  was  used  to  determine  the  acute 
toxicity.  The  AE  activity  was  expressed  as  ED50 
(the  dose  that  is  effective  in  50%  of  the  animals 
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tested),  and  estimated,  with  their  95%  confidence 
limits,  by  probit  analysis  [20]. 

Because  the  size  of  the  molecules  is  not  compat¬ 
ible  with  good  quality  ab  initio  calculations,  an 
AMI  model  Hamiltonian  [21]  (MOP AC  7.0  pack¬ 
age  [22])  was  chosen  for  the  conformational  search. 
For  each  derivative,  the  structures  associated  with 
the  initial  guesses  for  a  gradient-driven  full  geom¬ 
etry  optimization  were  generated  by  means  of 
modifications  of  the  torsional  angles  r5  and  t6 
(Fig.  1),  and  of  those  defined  in  the  hydrocarbon 
chain  These  and  the  other  geometry  pa¬ 

rameters  were  completely  relaxed  during  the  opti¬ 
mizations. 

The  conformational  search  has  been  performed 
as  follows: 

1.  The  r5  value  was  modified  in  90°  steps  from 
0°  to  180°  (270°  is  equivalent  to  90°  for  this  set  of 
molecules).  Intermediate  values  were  not  consid¬ 
ered  because  all  the  optimizations  starting  from 
the  above-mentioned  ones  converged  to  values 
close  to  either  r5  =  0°  or  r5  =  180°. 

2.  For  each  of  the.  t5  values,  r6  has  been  varied 
in  90°  steps.  In  a  similar  fashion  to  that  described 
for  t5,  two  minima  were  found,  associated,  respec¬ 
tively,  with  the  orientation  of  the  less  voluminous 
substituent  (a  hydrogen  atom  in  the  case  of  single 
N-substitution)  toward  the  hydrocarbon  chain  or 
opposite  to  it.  The  first  one  is  the  most  stable 
because  it  minimizes  steric  repulsion. 

3.  As  will  be  further  discussed,  it  is  well  known 
that  the  "all-trans"  conformation  is  the  most  stable 
for  the  hydrocarbon  chain.  A  thorough  discussion 
of  this  subject  can  be  found  in  Ref.  [23],  This 
conformation  has  been  confirmed,  however,  for  the 


\ 
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FIGURE  1.  Antiperiplanar  conformation  of  vpd.  tx  = 
torsional  angles  vaired  in  the  conformational  study  (/  =  1 
—  6);  t5  =  09C8C4H11>  t6  =  09CqNC. 


different  derivatives,  by  means  of  distortions  of 
the  Tj— r4  angles  in  60°  and  90°  from  their  starting 
180°,  followed  by  full  optimization  of  the  resulting 
structure. 

4.  Several  minima  were  defined,  after  the  ge¬ 
ometry  optimization  procedure,  which  were  asso¬ 
ciated  with  the  local  conformation  of  the  N-sub- 
stituents.  The  largest  number  of  minima  has  been 
found  for  the  most  flexible  molecule,  N-butil- 
valpromide:  three  minima  related  to  the  coordi¬ 
nates  of  the  substitutent  have  been  found  for  each 
pair  of  values  of  t5  and  r6,  with  an  energy  differ¬ 
ence  among  them  lower  than  3  kcal/mol.  The  one 
of  lowest  energy  has  been  always  considered  in 
further  steps  of  the  study. 

We  have  chosen  AMI  as  the  calculation  proce¬ 
dure,  on  the  basis  of  the  comparison  of  the  results 
of  semiempirical  AMI,  PM3,  MNDO  [22],  and 
ZINDO/S  [24,  25]  gas-phase  calculations,  the  lat¬ 
ter  at  the  configuration  interaction  singles  (CIS) 
and  singles  and  doubles  (CISD)  levels,  with  those 
of  ab  initio  calculations  of  different  quality  (G94 
[26]),  for  a  smaller  molecule,  acetamide  [27],  which 
is  representative  of  the  set.  In  the  semiempirical 
calculations  the  geometry  of  the  molecule  has  been 
optimized  within  each  methodology.  Only  for  the 
ZINDO/S  calculations  the  AMI  optimized  struc¬ 
ture  was  used.  The  ab  initio  computational 
methodology  uses  Gaussian  94  [26],  at  the  6- 
31G(d),  6-31G(d,  p),  6-31  +  G(d),  6-31+G(d,p) 
and  6-311  +  G(d,p)  levels  of  theory.  The  influence 
of  electron  correlation  was  analyzed  at  the  MP2 
and  CISD  levels,  for  the  geometry  optimized  at  the 
6-311 +  G(d,p)  HF  level.  In  each  case,  orbital- 
based  descriptions  (Mulliken  population  analysis, 
MPA  [28],  and  natural  population  analysis,  NPA 
[29])  and  charges  from  electrostatic  potentials 
(CHelpG  [30])  derived  from  ab  initio  calculations, 
have  been  compared  with  the  MPA  that  follows 
each  semiempirical  approach.  Being  aware  of  the 
fact  that  the  local  charges  are  not  quantum  me¬ 
chanical  observables,  the  calculated  dipole  mo¬ 
ments  were  compared  with  the  experimental  value 
(3.76  D  [31]),  as  a  way  of  testing  the  accuracy  of 
the  calculations.  From  the  comparison  of  the  equi¬ 
librium  geometries,  electronic  properties,  and  even 
the  description  of  the  resonant  effect  in  acetamide 
[27,  32],  we  conclude  that  the  AMI  results  are  the 
closest  to  ab  initio,  giving  also  a  calculated  value 
for  the  dipole  moment  very  close  to  experiment. 

As  will  be  further  explained  in  the  next  section, 
the  value  of  r5  differentiates  two  well-defined 
conformations  that  are  also  characterized  by  a  dif- 
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ferent  pharmacological  response.  AMI  calculations 
have  been  also  applied  to  calculate  the  energy 
barriers  involved  in  the  mutual  interconversion 
between  those  conformations  for  each  derivative. 
To  this  end,  the  torsional  angle  that  defines  the 
reaction  coordinate  to  evolve  among  the  different 
conformations  (t5)  has  been  varied  in  10°  steps.  For 
each  step,  this  coordinate  was  kept  frozen  while 
the  others  were  fully  optimized. 

The  selection  of  AMI  as  the  semiempirical  pro¬ 
cedure  for  the  study  of  systems  of  biological  inter¬ 
est,  including  searches  for  transition  states,  has 
been  also  validated  against  ab  initio  calculations 
by  R.  Cachau  [33]. 

Electronic  descriptors  have  been  derived  from  a 
Mulliken  population  analysis  [28]  performed  at  the 
AMI  level.  In  spite  of  the  lack  of  precision  of  this 
analysis  for  absolute  calculations,  their  results  are 
widely  accepted,  in  this  field,  for  the  study  of  the 
trends  in  their  variation,  on  well-defined  atomic 
centers,  that  follow  structural  modifications  per¬ 
formed  to  a  parent  structure  [23,  24,  35]. 


Results  and  Discussion 

From  the  comparison  of  the  AE  activity  (Table 
I),  it  becomes  noticeable  that,  whereas  almost  all 
the  substituted  amides  (except  mvpd)  are  active 
against  MES,  they  are  not  (except  cpvpd)  active 
against  PTS.  The  different  response  against  both 
types  of  convulsions  defines  a  difference  between 
the  derivatives  vs.  vpa  and  vpd  (Table  I)  and  is 
indicative  of  a  different  reaction  mechanism. 
Within  the  MES  model,  buvpd,  dievpd,  chvpd, 
ipvpd,  and  cpvpd  manifest  AE  activity,  which  is 
higher,  in  general,  then  vpd.  However,  one  of 
them,  the  morpholin  derivative,  does  not  show 
protection  at  all  against  convulsion.  This  difference 
may  contain  the  clue  to  understand  the  require¬ 
ments  associated  with  the  AE  activity.  We  have 
devoted  our  research  to  find  the  origin  of  this 
difference. 

In  agreement  with  previous  calculations  by  us 
[23]  and  other  authors  [36],  we  have  found  the 
larger  stability  for  the  linear,  all-trans  conforma¬ 
tion  in  the  valproyl  moiety  (Fig.  1).  Further  confir¬ 
mation  is  obtained  from  X-ray  diffraction  studies 
of  one  of  the  derivatives  of  the  series  (cpvpd)  [37]. 

In  the  carboxamide  moiety,  on  the  other  hand, 
two  minima  were  found  after  the  geometry  opti¬ 
mization  procedure,  which  are  related  to  values  of 


TABLE  I _ 

Anticonvulsant  activity,  expressed  as  DE50, 
of  the  compounds  analyzed.8 


EDso 

PTZ  test 

ed50 

MES  test 

Dose 

(/xmol/kg) 

Dose 

( /xmol/kg) 

vpa 

1261 

751 

(1155-13771) 

(526-1074) 

vpd 

385 b 

392 b 

phen 

NEC 

24° 

cpvpd 

1847 

1159 

(1491-2288) 

(1002-1340) 

buvpd 

NE 

91 

(a  375  /xmol/kg) 

(55-152) 

chvpd 

NE 

66 

(a  200  /xmol/kg) 

(36-121) 

mvpd 

NE 

NE 

(a  1550  /xmol/kg) 

(a  1500  /xmol/kg) 

dievpd 

NE 

128 

(a  375  /xmol/kg) 

(93-176) 

ipvpd 

NE 

142 

(a  200  /xmol/kg) 

(114-177) 

aDE50  :  50%  effective  dose,  NE:  not  effective,  vpa,  valproic 
acid;  vpd,  valpramide;  phen,  phenytoin;  cpvpd:  A/-(4-carbo- 
xyphenyl);  buvpd:  N-buty;  chvpd:  A/-cycloheyl;  mvpd: 
N-morpholin;  dievpd:  N,N- diethyl  ipvpd:  W-isopropylval- 
paramide. 

b  S.  Hadad,  T.  Vree,  E.  Kleijn,  and  M.  Bialer,  J.  Pharm.  Sci. 
81(10)  (1992)  1047. 

c  E.  Shek,  T.  Murakami,  C.  Nath,  E.  Pop,  and  N.  Bodor,  J. 
Pharm.  Sci.  78  (10)  (1989)  837. 


r5  close  to  0°  and  180°,  respectively.  This  fact 
allows  one  to  distinguish  between  an  "eclipsed" 
(synperiplanar)  and  an  "opposite"  (antiperiplanar) 
09  and  Hn  conformations  (Fig.  1).  According  to 
the  results  of  the  AMI  calculations  (Table  II),  the 
synperiplanar  conformation  (r5  close  to  0°)  is  pre¬ 
ferred  by  several  active  molecules  (vpd,  cpvpd, 
buvpd,  chvpd,  ipvpd),  whereas  the  nonactive, 
morpholin  derivative  is  more  stable  in  the  oppo¬ 
site  one.  Far  from  being  a  conformational  require¬ 
ment,  the  energy  difference  between  both  orienta¬ 
tions,  lower  in  general  than  1  kcal/mol  (increasing 
to  about  2  kcal/mol  for  dievpd  and  ipvpd),  shows 
that  both  conformers  can  coexist  in  equilibrium, 
provided  that  both  are  formed  as  a  result  of  their 
synthesis.  The  value  of  r6  also  define  two  different 
conformations.  The  most  stable  one  is  always  re¬ 
lated  to  the  orientation  of  the  less  voluminous 
N-substitutent  toward  the  hydrocarbon  chain,  for 
both  the  synperiplanar  and  the  antiperiplanar  con- 
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TABLE  II _ 

Relevant  conformational  data  (dihedral  angles, 
t5,  t6)  of  the  most  stable  geometries  of  the 
structure  analzyed. 


Dihedral  angles 

T6  t5 

AE 

(kcal) 

b.h 

(kcal) 

vpa 

0 

-179.3 

0.7 

2.1 

vpd 

-0.22 

0.46 

0.8 

2.2 

phen 

179.90 

-176.28 

- 

- 

cpvpd 

5.87 

-0.95 

0.7 

2.3 

buvpd 

-7.93 

2.03 

0.9 

2.1 

chvpd 

-0.35 

-2.02 

0.8 

2.1 

mvpd 

-1.11 

169.23 

1.3 

19.0 

dievpd 

6.93 

167.58 

2.3 

6.56 

ipvpd 

-0.10 

-2.60 

1.75 

2.33 

a  Energy  differences  (A E)  between  synperiplanar  and  an- 
typeriplanar  conformations,  and  energy  required  (b.h)  for 
their  mutual  interconversion.  r6  =  09C8NC,  r5  = 
Og^8^4^ii  ■ 


formations.  However,  no  distinguishable  pharma¬ 
cologic  activity  is  associated  with  them. 

Because  we  have  found  the  active-nonactive 
condition  related  to  the  value  of  the  r5  angle,  we 
have  centered  our  study  in  the  energetics  asso¬ 
ciated  with  the  mutual  interconversion  of  the 
conformers  defined  by  it.  The  calculated  energy 
barriers  (Table  II)  are  very  small  for  the  active 
derivatives  (~  2  kcal /mol),  but  the  barrier  in¬ 
creases  to  more  than  10  times  for  the  nonactive 
morpholin,  due  to  steric  repulsion  between  the 
voluminous  morpholin  substitutent  and  the  hy¬ 
drocarbon  chain  in  the  intermediate  configuration 
of  the  reaction  path,  defined  by  r5  =  0°.  It  is  no¬ 
ticeable  that  both  the  characteristics  of  the  most 
stable  conformation,  and  the  energy  difference  be¬ 
tween  the  more  stable  ones,  seem  to  be  deter¬ 
mined  by  a  steric  factor,  associated  with  the  size  of 
the  N-substitution.  In  the  set  of  compounds  that 
has  been  considered,  single  N-substitution  has 
never  led  to  nonactive  structures.  Disubstitution 
increases  the  barrier  in  both  dievpd  and  mvpd,  but 
the  calculated  barrier  of  the  former  is  not  high 
enough  to  consider  it  as  unable  to  accommodate  to 
the  requirements  of  the  receptor.  In  agreement 
with  this,  the  biological  test  (Table  I)  indicates  that 
dievpd  is  AE  active. 

The  active  molecules  are  characterized,  thence, 
by  soft  degrees  of  rotational  freedom  and  can 
accommodate  themselves  to  the  conformational  re¬ 


quirements  defined  by  the  receptor  site  at  a  very 
low  energy  cost.  Rotation  around  the  C4-C8  bond 
is  impeded,  on  the  other  hand,  for  the  nonactive 
morpholin  derivative,  which  will  remain  in  the 
conformation  that  results  from  its  synthesis.  This 
conformation,  once  it  is  known,  defines  the  struc¬ 
tural  characteristics  of  the  nonactive  structures. 

At  this  point  we  can  say  nothing  about  the 
active  conformation  of  the  valpromide  derivatives, 
because  of  the  flexibility  of  the  active  structures, 
on  one  side,  and  the  lack  of  knowledge  of  the 
conformation  that  results  from  the  synthesis,  on 
the  other.  The  flexibility  of  the  active  molecules 
does  not  allow  us  to  proceed  to  a  superposition 
analysis  based  on  the  comparison  of  their  struc¬ 
tures,  as  all  of  them  can  be  superimposed  at  a  very 
low  expense  of  energy.  In  order  to  assess  the 
conformational  characteristics  of  the  active  com¬ 
pounds,  we  have  considered  rigid  analogs,  rigidi- 
fied  by  cyclation,  whose  conformation  and  AE 
activity  are  perfectly  known  [38].  To  this  end,  we 
have  included  phenytoin  (Fig.  2)  in  the  series  and 
used  it  as  our  template  for  the  similarity  analysis. 
It  has  a  rigid  antiperiplanar  conformation,  defined 
by  the  09  and  N  atoms.  The  structure  is  rigidified 
in  both  the  hydrocarbon  chain  and  in  the  amide 
group,  and  is  active  against  the  MES  test  (Table  I). 
It  is  nonactive  against  PTS,  a  fact  that  can  be 
understood  as  indicative  of  a  similar  AE  mecha¬ 
nism  as  the  N-substituted  valpromides  analyzed 
in  this  series,  but  cpvpd. 


H 


FIGURE  2.  Most  stable  conformation  of  phenytoin,  a 
rigid  analog,  as  a  result  from  AMI  calculations. 
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The  structure  of  phenytoin  partially  overlaps 
with  the  antiperiplanar  conformation  of  the  substi¬ 
tuted  valpromides  (Fig.  3),  but  not  with  the  syn- 
periplanar.  Moreover,  the  charges  on  the  atomic 
centers  are  almost  the  same  for  the  overlapping 
portion.  On  the  basis  of  this  analysis,  we  can 
associate  the  AE  activity  (protection  against  MES) 
of  the  molecules  under  study  with  the  antiperipla¬ 
nar  conformation.  Thence,  regardless  of  the  confor¬ 
mation  obtained  by  synthesis,  the  active  structures 
are  flexible  and  can  adopt  this  active  conformation 
without  energy  penalty.  For  the  morpholin  deriva¬ 
tive,  on  the  other  hand,  the  synperiplanar  confor¬ 
mation  is  very  likely  to  be  favored  in  the  synthesis, 
because  it  avoids  steric  repulsion  between  the  H 
atom  of  the  valproyl  moiety  and  the  morpholin 
group.  Its  lack  of  flexibility  disables  it  to  adopt  the 
opposite  configuration,  and,  in  this  way,  its  lack  of 
activity  can  be  understood. 

We  want  to  remark  that  the  active  conforma¬ 
tions  derived  from  the  similarity  analysis  based  on 
the  comparison  with  phenytoin  (antiperiplanar)  are 
not  the  most  stable  ones  that  result  from  the  con¬ 
formational  search,  based  on  AMI  calculations 
(Table  II).  However,  because  of  the  flexibility  of 
the  molecules,  the  conformational  analysis  is  not 
sufficient  per  se  to  establish  the  structural  require¬ 
ments  of  the  active  structures.  We  have  based  our 
conclusions  on  their  comparison  with  phenytoin. 


The  superposition  of  the  structure  of  phenytoin 
with  the  eclipsed  and  opposite  conformations  of 
the  substituted  valpromides  (Fig.  3)  allows  one  to 
define  a  pharmacophoric  pattern  for  the  latter, 
which  is  mainly  associated  with  the  opposite  ori¬ 
entation  of  the  carbonylic  group  relative  to  Hn 
(antiperiplanar  orientation.  Fig.  4).  The  nonzero  AE 
activity  of  the  N,N-diethyl  valpromide  demon¬ 
strates  that  the  requirement  of  having  a  H  atom 
bonded  to  the  aminic  nitrogen  is  not  included  in 
the  definition  of  the  pharmacophore.  According  to 
the  superposition  analysis,  the  straight  conforma¬ 
tion  of  the  hydrocarbon  chain  does  not  appear 
either  as  a  requisite  for  the  AE  activity.  The  defini¬ 
tion  of  the  pharmacophore  includes  the  carbon 
atoms  of  the  aliphatic  chain  in  a  and  (3  positions 
relative  to  the  amide  functionality.  Although  it 
does  not  include  the  seven  carbon  atoms  of  the 
valproil  moiety,  it  should  be  kept  in  mind  that, 
within  a  series  of  monocarboxilyc  acids,  vpa  has 
the  optimal  chemical  structure  with  regards  to 
margins  between  its  anticonvulsant  activity  and  its 
sedative  or  hypnotic  effects  [39,  10].  Consequently, 
the  importance  of  the  size  of  the  hydrocarbon 
chain  cannot  be  disregarded. 

The  local  charges  on  the  atoms  of  the  valproil 
moiety  are  not  significantly  modified  by  substitu¬ 
tion.  As  the  charges  remain  the  same  along  the 
series,  including  phenytoin,  structural  similarity  in 
the  definition  of  the  pharmacophore  implies  elec¬ 
tronic-structural  similarity.  This  is  in  agreement 


FIGURE  3.  Superposition  of  the  structure  phenytoin 
with  the  opposite  conformation  (carbonyl  group  relative 
H^)  of  the  substituted  valpromides,  exemplified  by  vpd. 
A  stick  model  has  been  chosen  because  it  is  the  only 
one  that  allows  to  visualize  the  overlapping.  Thick  line: 
vpd;  broken  line:  phenytoin. 


FIGURE  4.  Molecular  portion  that  define  the  structural 
requirement  associated  with  the  AE  activity 
(pharmacophore).  X  refers  to  H  for  the  set  of  substituted 
valpromides  and  to  N  for  phenytoin.  The  local  charges 
on  the  most  relevant  atoms  are  shown. 
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with  the  premise  that  the  receptor  site  perceives 
electronic  distributions  approaching  to  it. 

According  to  the  previous  description,  our  con¬ 
clusions  have  been  mainly  derived  from  the  com¬ 
parison  of  the  syn-  and  antiperiplanar  conformers 
of  the  derivatives  of  the  series  with  phenytoin 
(rigid  analog),  attending  both  the  nuclear  coordi¬ 
nates  and  the  local  charges  on  the  atomic  centers. 
The  superimpossible  portion  of  the  active  deriva¬ 
tives  define  the  pharmacophore. 

Having  identified  the  pharmacophore  associ¬ 
ated  with  the  AE  activity  of  the  N-substituted 
valpromides,  the  next  step  is  oriented  to  find  a 
structural  or  electronic  quantifier  that,  correlating 
with  the  AE  potency,  would  allow  us  to  design 
new  derivatives  with  a  predetermined  activity. 
This  information  has  to  be  derived  from  an  QSAR 
analysis.  Because  more  data  (derivatives)  are  nec¬ 
essary  for  this  analysis  to  become  statistically  sig¬ 
nificant,  research  in  our  lab  is  presently  conducted 
in  the  synthesis  of  new  derivatives,  which  are 
being  designed  on  the  basis  of  the  knowledge  of 
the  pharmacophoric  pattern. 


Concluding  Remarks 

In  this  article  we  have  presented  a  conforma¬ 
tional  and  electronic  study  of  several  N-sub¬ 
stituted  valpromides,  followed  by  a  similarity 
analysis  that,  taking  into  account  the  flexibility  of 
the  molecules,  has  considered  a  rigid  analog 
(phenytoin)  as  template.  Phenytoin  has  been  in¬ 
cluded  in  the  analysis,  and  both  the  position  of  the 
nuclei  and  the  local  charges  on  the  atomic  centers 
have  been  compared  with  those  of  the  active  and 
inactive  structures.  This  procedure  has  allowed  us 
to  define  a  pharmacophore,  related  to  the  antiperi¬ 
planar  orientation  of  the  amide  function  relative  to 
the  hydrocarbon  chain,  and  defined  by  the  struc¬ 
tural  portion  shown  in  Figure  4. 

On  the  basis  of  the  knowledge  of  the  pharma¬ 
cophoric  pattern,  new  derivatives  are  being  syn¬ 
thesized  in  our  lab  in  order  to  be  able  to  perform  a 
statistically  significant  QSAR  analysis. 
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ABSTRACT:  A  model  system  whose  density  of  states  is  an  analytical  function  of  the 
potential  energy  is  obtained  by  combining  potential  energy  wells  given  by  Lennard-Jones 
6-12  potentials  representing  pairwise  interactions  between  atoms  and  circular  barriers. 
Structural  aspects  of  polypeptide  chains  such  as  sharp  and  broad  energy  extremes  and 
close-packed  and  loose-packed  conformations  are  simulated.  By  changing  Lennard-Jones 
parameters,  the  density  of  states  is  described  as  a  function  of  topological  features  of  the 
potential  energy  surface  and  rules  used  to  interpret  density  of  states  calculations  are 
derived.  Important  results  are  that  the  number  of  clusters  of  density  of  states  maxima  in 
a  given  energy  range  approaches  the  number  of  conformational  families  and  very  low 
density  of  states  gaps  indicate  the  existence  of  kinetic  barriers.  These  conclusions  are 
applied  to  the  conformational  analysis  of  a-MSH.  Structural  implications  are  discussed. 
©  1997  John  Wiley  &  Sons,  Inc.  Int  J  Quant  Chem  65:  1115-1124,  1997 
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Introduction 

The  potential  energy  landscape  (PEL)  contains 
relevant  information  from  which  it  could  be 
possible,  in  principle,  to  extract  all  equilibrium 
and  kinetic  properties  obtainable  by  theoretical 
means.  However,  a  combinatorial  explosion 


and/or  the  multiple  minimum  problem  are  faced 
in  any  attempt  to  carry  out  a  detailed  analysis  of 
its  topology.  Although  we  may  adopt  a  semiem- 
pirical  potential  energy  function,  it  is  not  always 
possible  to  use  it  without  further  approximations. 
Recently,  it  was  shown  [1]  that  there  are  cases  for 
which  it  is  possible  to  find  the  global  energy  mini¬ 
mum.  Even  considering  that  the  protein-folding 
problem  can  be  solved  when  the  absolute  mini- 
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mum  is  reachable,  the  study  of  structure-function 
relationships  presents  a  greater  complexity,  since, 
in  many  cases,  the  exercise  of  a  function  requires  a 
structural  transition,  i.e.,  more  than  one  energy 
minimum  is  involved. 

The  density  of  states  or  energy  distribution  may 
also  be  used  to  calculate  equilibrium  and  kinetic 
properties,  even  though  the  interpretation  of  den¬ 
sity  of  states  calculations  presents  some  ambigui¬ 
ties  such  as  whether  [2-4]  a  zero  density  of  states 
gap  indicates  a  folding  tendency.  Part  of  the  diffi¬ 
culty  is  that  whereas  the  semiempirical  potential 
energy  surface  is  an  analytical  function  the  density 
of  states  is,  in  most  cases,  a  numerical  function.  In 
this  article,  we  present  a  model  that  enables  an 
analytical  solution  of  the  density  of  states  as  a 
function  of  the  internal  potential  energy.  We  de¬ 
veloped  the  idea,  initially  part  of  models  used  to 
study  the  dynamics  of  liquids  [5],  that  the  density 
of  states  obtained  analytically  or  numerically  as  a 
function  of  the  potential  energy  may  be  employed 
to  investigate  the  existence  of  extremes  in  the 
potential  energy  landscape  of  a  polypeptide  chain 
within  a  chosen  energy  range. 

The  results  obtained  with  the  present  model 
system  are  compared  to  calculations  of  a  polypep¬ 
tide  chain.  The  13-mer  polypeptide  chain  of  a-MSH 
was  chosen  as  a  suitable  example.  There  are  exper¬ 
imental  and  theoretical  data  related  to  the  binding 
[6,  7]  of  a-MSH  to  biological  membranes,  indicat¬ 
ing  that  a-MSH  undergoes  a  structural  transition 
as  it  moves  from  one  chemical  environment  to 
the  other. 

A  previous  study  [8]  of  the  most  probable  con¬ 
formational  families  adopted  by  a-MSH  showed 
that  a  broad  sampling  of  the  conformational  space 
necessary  to  describe  a  conformational  transiton 
may  be  achieved  with  an  unbiased  conformational 
analysis  in  which  no  initial  attempt  is  made  to 
favor  the  occurrence  of  minimum-energy  struc¬ 
tures.  Only  after  the  determination  of  conforma¬ 
tional  families  was  an  energy  minimization 
procedure  adopted  to  find  stable  conformations. 
This  same  example  is  used  here  to  show  how  the 
results  of  a  conformational  analysis  may  be  inter¬ 
preted  with  density  of  states  calculations. 


Theory:  Analytical  Solution  of  a 
Model  System 

Even  for  the  most  simple  dipeptide  chain,  it  is 
not  possible  to  derive  the  density  of  states  rj  as  an 


analytical  function  of  the  potential  energy  E.  To 
overcome  this  limitation,  we  devised  a  model  de¬ 
void  of  unnecessary  complications  that  contains 
the  essential  characteristics  that  we  want  to  con¬ 
sider. 

Let  us  begin  with  this  model,  depicted  in  Figure 
1,  for  which  rj  is  an  analytical  function  of  E.  Our 
main  assumption  is  that  each  atom  interacts  with 
barriers  formed  by  the  other  atoms.  That  it  should 
be  so  is  justified  by  the  consideration  that  in 
close-packed  structures  each  atom  is  in  the  vicinity 


(a) 

Y 


x 


(b) 

y 


X 

FIGURE  1 .  The  model  system.  /,  are  the  radii  of  circular 
barriers  as  described  in  the  Theory  section,  (a)  Extended 
loose-packed  conformational  family  generated  with  /,  < 
l2  <  •••  <  /„;  (b)  coiled  close-packed  conformational 
family  generated  with  /1+/(  <  l2+k  <  <  ln  +  k,  n  = 

1,2 . fr  =  0,1,2 . 
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of  many  surrounding  atoms.  On  the  other  hand, 
when  loose-packed  flexible  chains  are  being  con¬ 
sidered,  the  assumption  of  barriers  also  accounts 
for  an  averaging  of  atomic  interactions  in  a  large 
set  of  different  conformations. 

An  important  consequence  of  this  approxima¬ 
tion  is  that  the  potential  energy  may  be  considered 
as  a  function  of  distances  between  barriers  and 
atoms  dij  =  lj  -  ^ xf  +  yf  instead  of  interatomic 

distances  d{j  =  ^ ~xfj  +  y?- .  By  making  the  further 
assumption  that,  as  shown  in  Figure  1,  the  barriers 
are  circles  of  radius  lu  we  confer  to  the  model 
system  the  circular  symmetry  that  makes  possible 
an  analytical  solution. 

As  shown  in  Figure  1(a),  if  the  radii  /2  follow  an 
increasing  order  (Z1  <  Z2  <  •••  <lN),  the  chain 
molecule  is  forced  to  adopt  extended  conforma¬ 
tions.  On  the  other  hand,  if  as  shown  in  Figure  1(b) 
the  1 1  have  a  periodicity  similar  to  l1+k  <  l2+k  < 
•••  <  ln+K'  k  =  0, 1, 2, ... ,  a  variety  of  coiled  con¬ 
formations  becomes  possible.  Configurations  of  the 
model  system  generated  with  different  sets  of  l/s 
will  be  referred  to  as  conformational  families. 

In  the  Appendix,  the  following  equation  is  de¬ 
rived  for  the  density  of  states  17(E)  as  a  parametric 
function  of  the  potential  energy  E: 


17(E) 
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where  and  S2;  are  defined  in  Eqs.  (A.6)  and 
(A13),  Vtj  are  6-12  Lennard-Jones  potentials  [Eq. 
(Al)],  a(j  and  b(:  are  the  parameters  of  these  po¬ 
tentials,  and  E  =  Ef  < ;  VL.  The  minus  and  plus 
signs  on  the  right-hand  side  apply,  respectively,  to 
positive  and  negative  values  of  Besides  being 
a  function  of  E,  the  density  of  states  depends  on 
the  Lennard-Jones  parameters  of  Eq.  (1)  and  on  the 
radii  lj  of  the  circular  barriers,  as  well. 

If  instead  of  this  simple  model  used  to  derive 
Eq.  (1)  we  started  from  a  real  polypeptide  chain, 
we  would  need  to  make  use  of  the  equation 

A  =  f  f  /  S(XuX2'---'XN)dXi  dx 2  •••  dxN 

(2) 


to  calculate  the  surface  area  of  the  potential  energy 
landscape  [  S  is  given  by  Eq.  (A.8),  Xi  are  internal 
degrees  of  freedom,  and  D  is  the  domain  encom¬ 
passed  by  the  xi\-  The  evaluation  of  the  integrals 
in  this  equation  is  very  difficult,  if  not  impossible, 
and  the  density  of  states  of  a  polypeptide  chain  is 
obtainable  only  as  a  numerical  function  of  the 
potential  energy  surface.  However,  Eq.  (2)  shows 
that  17(E)  may  be  seen  as  a  projection  of  the 
potential  energy  landscape  onto  a  two-dimensional 
space. 


Methods 

Five  main-chain  rotamers  (A:  <f>  =  -57,  if/  = 
-47;  B:(f>=  -139,  ^  =  135;  G:  0  =  -60,  if/  -  -30; 
D:  <fi  =  —90,  if/  —  0;  and  E:  cf)  —  70,  if/  =  —60)  were 
employed  in  the  search.  For  the  side-chain  confor¬ 
mations,  we  made  use  of  the  gauche  minus,  trans, 
and  gauche  plus  rotamers  of  the  Ponder  and 
Richards  classification  (the  numerals  1,  2,  and  3 
indicate,  respectively,  the  gauche  minus,  trans ,  and 
gauche  plus  rotamers  of  Xi)  [9].  Gas-phase  energies 
were  calculated  with  the  ECEPP/2  force  field  [10]. 
Hydration  energies  were  calculated  with  the  Ooi 
et  al.  [11]  hydration  potentials  and  the  Connoly 
routine  [12]  for  surface  area  computation.  Matrix 
operations  described  in  the  literature  [13]  were 
employed  in  the  conformational  search  with  the 
following  procedure:  An  initial  set  of  conforma¬ 
tions  belonging  to  the  amino  end  tetrameric  frag¬ 
ment  of  the  peptide  chain  is  generated  with  a 
matrix  operation.  This  set  undergoes  a  selection  in 
which  those  chain  conformations  populating  den¬ 
sity  of  states  maxima  and  minima  are  selected, 
resulting  in  a  much  smaller  set  of  conformations. 
Each  of  these  latter  chain  conformations  becomes 
then  part  of  a  matrix  operation  in  which  the  struc¬ 
ture  of  the  amino  end  tetrameric  fragment  is  frozen 
whereas  the  structure  of  the  remaining  part  of  the 
chain  is  varied.  The  resulting  set  of  conformations 
is  screened  again  following  the  same  criterion  of 
density  of  states  maxima  and  minima.  The  same 
process  is  repeated  until  the  end  of  the  chain. 


Results  and  Discussion 

INTERPRETATION  OF  DENSITY  OF  STATES 
CALCULATIONS 

By  choosing  appropriate  values  of  the  parame¬ 
ters  in  Eq.  (1),  it  is  possible  to  reproduce  various 
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aspects  of  the  structures  adopted  by  polypeptide 
chains  such  as  sharp  and  broad  energy  minima 
and  close-packed  and  loose-packed  conformations. 
A  detailed  examination  of  these  calculations  indi¬ 
cates  how  to  find  correspondences  between  fea¬ 
tures  of  potential  energy  surfaces  and  density  of 
states  plots. 

A  comparison  between  the  density  of  states  and 
the  potential  energy  surface  may  be  carried  out  for 
one  (i.e.,  for  one  set  of  // s)  or  many  conforma¬ 
tional  families.  The  most  important  features  that 
we  want  to  consider  are  the  number  and  location 
of  density  of  states  maxima  and  minima  and  their 
relative  magnitudes,  compared  to  the  location  of 
extremes  in  the  potential  energy  surface. 

The  potential  energy  surface  given  by  Eq.  (A.3) 
is  rugged,  since  it  increases  abruptly  whenever  an 
atom  approaches  a  barrier.  On  the  other  hand,  the 
single  atom  interaction  energy  Vi}  is  smooth  and 
has  a  minimum  when  the  distance  (d/;)  between 
atom  i  and  barrier  Z;-  is  [lj  -  2  •  To  have 

a  smooth  representation  of  the  potential  energy 
surface,  we  have  defined  a  parameter  x  so  that 


and  Vij  is  a  minimum  when  x  =  T*  The  potential 
energy  E  is  plotted  as  a  function  of  x  considering 
that  all  d^'s  change  in  concert  with  x  =  0  to  x  =  °°* 

In  Figure  2,  the  coexistence  of  two  different 
conformational  families  is  illustrated.  As  shown  in 
Figure  2(b)  and  (c),  low-energy  density  of  states 
maxima  are  populated  mainly  by  the  lowest- 
energy  conformations  of  each  conformational  fam¬ 
ily.  According  to  these  results,  clusters  of  density 
of  states  maxima  separated  by  very  low  density  of 
states  gaps  may  be  interpreted  as  distinct  confor¬ 
mational  families,  and  a  gap  in  77  indicates  the 
presence  of  a  kinetic  barrier  [as  shown  in  Eqs. 
(A.15)  and  (A.16)].  This  same  analysis  shows,  how¬ 
ever,  that  if  the  minimum-energy  conformations  of 
different  conformational  families  have  similar  en¬ 
ergies  the  corresponding  clusters  of  density  of 
states  maxima  may  overlap.  In  the  example  de¬ 
picted  in  Figure  2(d),  there  is  a  partial  overlap  of 
density  of  states  maxima. 

In  Figure  3,  the  model  system  is  allowed  to 
occupy  energy  wells  corresponding  to  the  configu¬ 
rations  represented  in  Figure  1(a)  and  (b).  The 


lowest  potential  energy  well  corresponds  to  the 
coiled  and  close-packed  configuration  [Fig.  1(b)], 
whereas  the  highest  potential  energy  well  (which 
is  also  broader)  shown  in  Figure  3(a)  corresponds 
to  the  extended  and  loose-packed  configuration 
[Fig.  1(a)].  A  comparison  between  Figure  3(b)  and 
(c)  shows  that  the  loose-packed  conformational 
family  has  a  much  larger  density  of  states,  as  one 
would  expect  intuitively.  The  sudden  increase  of 
77(E)  seen  in  Figure  3(d)  when  the  energy  increases 
above  the  threshold  where  the  transition  from 
coiled  to  extended  conformational  families  takes 
place  is  seen  in  the  present  (see  Fig.  4)  and  in 
previous  calculations  [2-4,  14,  15]. 

The  above-described  analysis  shows  that  77  X  E 
plots  have  an  inherent  ambiguity.  Density  of  states 
gaps  should  correspond  to  a  folding  tendency  if 
there  are  clusters  of  density  of  states  maxima  sepa¬ 
rated  by  these  gaps  populated  by  unique  confor¬ 
mational  families,  and  in  this  case,  it  also  indicates 
the  existence  of  a  kinetic  barrier  [see  Eqs. 
(A.14)-(A.16)  of  the  Appendix  for  details].  How¬ 
ever,  the  same  conformational  family  may  popu¬ 
late  more  than  one  cluster  of  density  of  states 
maxima  in  which  case  a  folding  tendency  should 
not  exist. 

Usually,  it  is  not  possible  to  tell  the  two  situa¬ 
tions  apart  just  by  examining  77  X  E  plots  and 
some  other  source  of  evidence  is  necessary.  As 
shown  in  the  next  section,  by  combining  density  of 
states  calculations  with  a  molecular  graphics  study 
of  the  conformations  that  belong  to  different  re¬ 
gions  of  the  77  X  E  plot,  we  may  execute  a  detailed 
conformational  analysis. 

APPLICATION  TO  POLYPEPTIDE  CHAINS 

The  same  basic  principles  obtained  from  the 
above  examples  are  considered  to  be  valid  for  real 
systems.  Our  main  purposes  were  to  identify  con¬ 
formational  families,  to  evaluate  their  relative  sta¬ 
bilities,  and  to  determine  the  presence  of  kinetic 
barriers  between  structural  transitions.  This  is 
achieved  by  generating  77  X  E  plots  as  those  shown 
in  Figure  4. 

Details  of  the  procedure  used  in  these  calcula¬ 
tions  are  discussed  in  the  Methods  section  and  in 
[13].  In  principle,  any  algorithm  that  fulfills  the 
requirement  of  an  unbiased  conformational  search 
may  be  used  to  calculate  the  density  of  states.  In 
this  article,  we  used  the  matrix  algorithm  [13]  for 
this  purpose. 
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FIGURE  2.  (a)  Potential  energy  wells  of  two  extended  configurations  of  the  model  system.  rjXE  plots  are  shown:  (b) 
the  lower-energy  well  corresponds  to  a  configuration  having  five  atoms,  a,7  =  0.1,  bi}  =  1,  and  h  <  /2  <  <  /5;  (c) 

corresponds  to  an  extended  configuration  having  the  same  number  of  atoms  and  same  Lennard-Jones  parameters  and 
radii  m,-  =  /,*  +  1 ;  (d)  17  x  E  plot  showing  a  sum  of  the  17(E)  values  belonging  to  the  described  conformational  families. 


<d) 


The  existence  of  three  a-MSH  populations,  or 
conformational  families,  in  aqueous  and  lipidic 
environments  is  supported  by  theoretical  and  ex¬ 
perimental  [6,  7]  evidence.  This  result  has  been 
interpreted  [6-8]  as  being  directly  related  to  the 
three  ro tamer  conformations  of  Xi  of  Trp  9. 

In  Figure  4,  we  also  show  the  energies  for  which 
the  most  stable  conformations  belonging  to  each  of 
the  three  rotamers  of  \\  of  Trp  9  were  found.  The 
trans  rotamer  of  Trp  9  is  the  first  density  of  states 
minimum,  whereas  the  gauche  minus  and  gauche 
plus  rotamers  are  just  after  the  second  density  of 
states  minimum. 

The  existence  of  a  density  of  states  minimum 
separating  two  maxima  indicates  a  folding  ten¬ 


dency  [2-4]  if  each  maximum  (or,  presumably,  the 
lowest-energy  maximum)  is  populated  by  only 
one  conformational  family.  As  shown  in  Table  I, 
the  lowest-energy  density  of  states  maximum  is 
populated  only  by  conformations  adopting  the 
trans  rotamer  of  X\  for  the  side-chain  rotamer  of 
Trp  9.  However,  regions  (b)  and  (c)  belonging  to 
the  second  density  of  states  maximum  are,  respec¬ 
tively,  populated  by  gauche  minus  and  trans  and 
by  gauche  plus  and  trans  rotamer  conformations 
of  Trp  9. 

These  results  indicate  that  a-MSH  has  a  folding 
tendency  favoring  the  trans  rotamer  of  Trp  9,  even 
though  it  is  not  possible  to  point  to  a  kinetic 
barrier  because  there  is  no  zero  (or  very  small 
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FIGURE  3.  (a)  Potential  energy  wells  of  two  configurations  of  the  model  system.  17  x  E  plots  are  shown:  (b)  the 
lower-energy  well  corresponds  to  a  coiled  configuration  having  nine  atoms  and  a}j  =  0.1  and  b/y  =  1  and  a  turn  at  every 
three  atoms,  whereas  the  higher-energy  well  (c)  corresponds  to  an  extended  configuration  having  the  same  number  of 
atoms  and  same  Lennard-Jones  parameters;  (d)  17  x  E  plot  showing  a  sum  of  the  77(E)  values  belonging  to  the 
described  conformational  families. 


density)  of  states  minimum.  A  tendency  to  one 
conformation  or  conformational  family  might 
change  if  a-MSH  were  allowed  to  coexist  in  the 
aqueous  solution  and  a  lipidic  phase.  To  examine 
these  questions,  we  calculated  the  hydration  ener¬ 
gies  of  all  conformations  within  the  energy  range 
delimited  by  the  energy  intervals  (a)  and  (c)  of 
Figure  4(i).  As  shown  in  Figure  4(h),  the  energy 
difference  between  the  energy  intervals  in  which 
chain  conformations  (a)  and  (b)  [and  (c)]  are  found 
increased  by  5  kcal/mol  when  hydration  energies 
are  considered,  an  indication  that  the  trans  ro- 
tamer  of  Trp  9  is  stabilized  in  an  aqueous  solution. 

In  Figure  5,  we  show  the  most  stable  conforma¬ 


tions  belonging  to  the  three  populations  discussed 
above.  It  is  seen  that  transitions  between  the 
rotamer  conformations  of  Trp  9  side  chains  are 
associated  with  extensive  structural  transitions 
encompassing  the  entire  chain. 


Conclusion 

One  of  the  main  advantages  of  the  application 
of  density  of  states  calculations  to  the  conforma¬ 
tional  analysis  of  polypeptide  chains  is  that  no 
matter  how  long  is  the  chain  the  density  of  states 
is  always  a  function  of  the  internal  energy  and 
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FIGURE  4.  log(i7)  x  E  plots  calculated  for  a-MSH  with  (i)  ECEPP/2  gas-phase  potentials  and  (ii)  ECEPP/2  [11] 
gas-phase  potentials  plus  hydration  [12]  potentials,  (a),  (b),  and  (c)  are  the  energy  intervals  where  the  most  stable 
conformations  adopting,  respectively,  the  trans,  gauche  minus,  and  gauche  plus  conformers  of  Trp  9.  The  energy  E  is 
rescaled  so  that  the  minimum-energy  conformation  is  the  zero-energy  level. 


)/XE,  a  two-dimensional  graph,  and  this  is  a 
consequence  of  the  density  of  states  being  a  projec¬ 
tion  of  the  potential  energy  landscape. 

However,  the  integrals  of  Eq.  (2)  cause  a  loss  of 
information  and  the  larger  the  number  of  internal 
degrees  of  freedom  the  more  information  is  lost. 
The  only  information  maintained  is  the  one  mostly 


needed  in  the  study  of  structure  and  function 
relationships,  e.g.,  the  number  of  conformational 
families  in  the  considered  energy  range  and  their 
relative  stabilities. 

An  important  consequence  of  Eqs.  (A.  15)  and 
(A.16)  is  that  to  every  local  energy  minimum  cor¬ 
responds  a  density  of  states  maximum.  Density  of 


TABLE  I _ 

Ten  more  stable  conformations  belonging  to  the  energy  intervals  indicated  as  (a),  (b),  and  (c)  in  Figure  4 
which  correspond  to  the  trans ,  gauche  minus,  and  gauche  plus  conformations  of  \i  of  Trp  9;  the  same 
conformations  are  depicted  in  Figure  5. 

(a)  (b)  (c) 


GAAAAAGDDDBGB3 

1121122121312 

GAAAAAGDDDBGB 

1121122121313 

GAAAAAGDDDBGB 

1121122121311 

GAAAAAGDDDBAG 

1121122121313 

GAAAAAGDDDBAA 

1121122121313 

GAAAAAGDDDBBB 

1121122121312 

GAAAAAGDDDBGG 

1121122121312 

GAAAAAGDDDBAB 

1121122121312 

GAAAAAGDDDBAB 

1121122121313 

GAAAAAGDDDBGA 

1121122121312 


-0.46451 00E  +  05b 
-0.4642300E  +  05 
-0.4602900E  +  05 
-0.45871 00E  +05 
-0.4570200E  +  05 
-0.45451 00E  +05 
— 0.4543300E  +  05 
— 0.4533500E  +  05 
— 0.4530400E  +  05 
-0.451 0200E  +05 


DDAGDGAAAABBB 

1111222211111 

DDAGDGAAAABGB 

1111222211112 

BGGGDAAAAABBB 

1212221221313 

BGGGDAAAAABGG 

1212221221312 

DDAGDGAAAABGB 

1111222211113 

ADABGAAAAABAG 

3331112221113 

BGGGDAAAAABAB 

1212221221312 

ADABGAAAAABBB 

3331112221112 

ADABGAAAAABBG 

3331112221113 

BGGGDAAAAABBB 

1212221221112 


-0.30381 00E  +  05 
-0.3029900E  +  05 
-0.3029200E  +  05 
-0.3028900E  +  05 
-0.3025200E  +  05 
-0.3021 800E  +  05 
-0.3021 500E  +  05 
-0.3021 000E  +  05 
-0,301 9700E  +  05 
— 0.301 8500E  +  05 


DDGBDGBBGABBB 

2121322131312 

GDDBBDAAAABDB 

1111112221113 

BGGGDAAAAABBA 

1212221221311 

ADABGAAAAABBA 

3331112221111 

ADDBBDAAAABDB 

1111112221113 

GDDBBDAAGABGB 

1111112221313 

BGABBDGAAABBG 

1231322221112 

ADDBBDAAGABGB 

1111112221313 

ADBBBBGAAABAA 

2131112121113 

DDAGDGAAAABEG 

1111222211212 


-0.2591 100E  +  05 
-0.2591 000E  +  05 
-0.2591 000E  +  05 
-0.2591 000E  +  05 
-0.2590900E  +  05 
-0.2590800E  +  05 
— 0.2590800E  +  05 
— 0.2590600E  +  05 
— 0.2590600E  +  05 
— 0.2590500E  +  05 


a  Letters  and  numerals  indicate,  respectively,  monomer  residue  main-chain  and  side-chain  conformers  as  defined  in  the  Methods 
section. 

b  ECEPP/2  gas-phase  energy  (kcal / mol). 
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(a) 


(c) 


GAAAAAGDDDBGB 

1121122121312 


DDAGDGAAAABBB 

1111222211111 


DDGBDGBBGABBB 

2121322131312 


FIGURE  5.  Most  stable  conformations  of  a-MSH  found  in  the  present  conformational  search  adopting  the  (a)  trans, 
(b)  gauche  minus,  and  (c)  gauche  plus  conformations  of  xi  of  Trp  9.  (1)  Letters  and  numerals  indicated  main-side  and 
side-chain  rotamers  (see  Methods  section  for  a  definition).  In  Figure  4,  the  corresponding  energy  intervals  are  indicated 
by  the  arrows.  Color  coding:  white:  hydrogen;  black:  carbon;  blue:  nitrogen;  red:  oxygen;  yellow:  sulfur. 


states  maxima  belonging  to  close  energy  minima 
clusterize  and  become  indistinguishable.  We  can 
still  postulate,  however,  that  the  number  of  clus¬ 
ters  of  the  density  of  states  maxima  is  less  or  equal 
to  the  number  of  conformational  families. 

A  similarity  between  the  presently  shown 
results  and  previous  calculations  is  seen,  for  in¬ 
stance,  in  conformational  searches  [15]  with  a  lat¬ 
tice  model  and  reduced  energy  representations.  As 
in  the  present  treatment,  conformations  are  gener¬ 
ated  regardless  of  their  energies  and  subsequently 
classified  in  the  order  of  increasing  energy.  In  all 
cases,  the  native  structure  belongs  to  a  small  maxi¬ 
mum  in  the  low-energy  side  of  energy  distribu¬ 
tions.  Another  example  is  the  energy  distribution 
of  normal  modes  of  vibration  [14]  which  shows 
that  near-  and  far-from-equilibrium  states  belong 
to  distinct  regions  of  density  of  states  vs.  energy 
plots. 


Appendix:  The  Model  System  Density 
of  States  vs.  Energy  Plot 

The  pairwise  molecular  mechanics  atomic  inter¬ 
actions  between  atoms  i  and  j  are  functions  of  the 
interatomic  distances  di}  =  ^ xfj  +  yf} .  However, 
in  the  model  system  depicted  in  Figure  1,  we 
consider  the  interaction  between  atom  i  and  circu¬ 
lar  barriers  formed  by  the  remaining  atoms.  This 


approximation  confers  to  our  model  the  circular 
symmetry  that  makes  possible  an  analytical  solu¬ 
tion  of  the  problem  of  describing  the  density  of 
states  as  a  function  of  the  potential  energy  surface. 
The  interatomic  interactions  are  replaced  by  inter¬ 
actions  between  atoms  and  circular  barriers  of 
radius  Z;.  The  proper  distances  are 

dij  =  lj  -  V(*,?  +  yf)  •  (A.i) 

By  applying  this  approximation  to  all  pairwise 
interactions,  the  nonbonded  contribution  to  the 
potential  energy  becomes 

E(xu  ylt  x2,  y2,...)  =  £  Vijt  (A.2) 

where 


and  ai}  and  bjj  are  Lennard-Jones  parameters. 

In  Figure  1,  it  is  shown  how  different  extended 
and  coiled  conformations  may  be  generated  with  a 
proper  choice  of  the  radii  lj .  As  shown  in  the 
following  derivation,  the  density  of  states  becomes 
a  function  of  the  Lennard-Jones  force-field  parame- 
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ters  atj  and  ■  and  of  the  structural  parameters  L,  where 


as  well 


Let  us  consider  the  region  of  the  potential  en¬ 
ergy  surface  delimited  by  the  conditions 


j(  *,/  Vij )  =  H,p2  -  xf  .  (A.ll) 


0  <  x{  < 


Ilf  -  xf  <Vi<  ]/ ( lj  ~  Hijf  -  xf  ,  (A .5) 


where 


-biji-h / -)ybfj  +  4  ♦  atj  •  Vz7 


.  (A.6) 


Within  this  region,  the  potential  energy  given  by 
Eq.  (A.3)  has  a  lower  value  of  E. 

The  number  of  states  that  may  be  occupied  by 
atom  i  within  the  boundaries  defined  by  Eqs.  (A.4) 
and  (A.5)  is  proportional  to  the  surface  area: 


In  this  simple  instance,  it  is  possible  to  integrate 
Eq.  (A.10)  analytically: 


=  £  S,7(yl7) 
i<) 


where 


h  -  go 

6  ‘  Hfj 


v>) 


bA  ,  tlL 
2^7  V'l  f 

yj b?j  +  4 '  aij  ‘  Vij 


(A. 12) 


A  =  E  /7 

•  ^  .•  •'n  -L 


S.XXi,  y,)  dx,  dx/j, 


6  *  fo-  12 • au  \ 
—dL-—dl\  (A. 13) 


«,  -  H,r  \  Hjj  Hfj 


where 


The  total  number  of  conformations  available  to  the 
model  system  within  this  region  is  then  propor¬ 
tional  to  the  total  surface  area  A  =  E?-  A{  and  the 
density  of  states  is  given  by 


dA  _  dV{j 

v(E)  =-7e  =  ^ 

dVn 


Replacing  this  relation  in  (A .7)  and  applying  the 
Leibnitz  formula  [16],  we  obtain 

77(E)  =  I,f,Si(xitIij(.xifVij)) 


is  obtained  by  replacing  Eq.  (A.ll)  in  the  term 
Sij(xif  Iij{xi,  Vij))  of  Eq.  (A.10). 

7]  X  E  plots  may  then  be  obtained  for  various 
values  of  the  parameters  in  Eq.  (A.12).  A  numeri¬ 
cal  analysis  of  77(E)  showed  that  the  behavior  of 
the  first  term  on  the  right-hand  side  of  Eq.  (A.12)  is 
dominated  by  the  expressions 


bJL  |  aJl 
b-  2-Vl  Vu 

(A. 14) 

T(VL)  obeys  the  following  limits,  when  the  sec¬ 
ond  term  on  the  right-hand  side  is  subtracted  from 
the  first: 


(A.10) 


lim  T(VU)  =  -co 

t_2  ' 


lim  T(V::)  =  0 

vfj-*  0  ’ 

lim  TOO  =  0 


(A. 15) 
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and  when  it  is  added 

Hm  T(V,)  =  00 

"  4«„ 

lim  T(V„)  =  oo  (A.16) 

vh- 0 

lim  T(Vif)  =  0. 

vir»«  ; 

b?/4  •  is  (for  large  /  ■)  the  value  of  the  energy 
minimum  of  Vjy. 
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ABSTRACT:  In  this  work,  we  modeled  leupeptin  (Ac.Leu.Leu.Arg.CHO),  a  natural 
inhibitor  of  proteases,  and  the  active  site  of  papain,  a  cysteine  protease,  using  as  a 
template  the  crystal  structure  of  a  leupeptin-papain  complex  recently  obtained  by 
Schroder  and  co-workers  [FEBS  Lett.  135(1),  38  (1993)]  and  including  11  amino  acids 
relevant  to  the  proteolytic  activity  of  the  enzyme.  Our  results  show  that  the  AMI  fully 
optimized  leupeptin  is  more  stable  than  is  the  leupeptin  crystal  structure  by  about 
6.0  kcal/mol.  Our  results  show  also  that  in  the  modeled  active  center  of  papain  the 
S — H-“N  structure  is  favored.  When  the  aldehyde  is  included  in  the  calculation, 
however,  proton  transfer  occurs  with  a  strengthening  of  the  S~  •••HIm+  O— C 
(Asnl75)  catalytic  triad.  The  AMI  method  reproduces  fairly  well  the  interactions  between 
the  enzyrrie  and  the  host  molecule.  ©  1997  John  Wiley  &  Sons,  Inc.  Int  J  Quant  Chem  65: 
1125-1134,  1997 

Key  words:  leupeptin;  papain;  semiempirical;  cysteine  protease;  active  center 


Introduction 

Leupeptin  is  an  acyltripeptide  aldehyde 
(Ac.Leu.Leu.Arg.CHO)  of  mycrobial  origin 
that  has  been  used  to  investigate  the  possible  role 
of  protease  in  several  important  biological  pro- 

Correspondence  to:  R.  Bicca  de  Alencastro. 

Contract  grant  sponsors:  CNPq  (Brazil);  FAPERJ  (Brazil). 


cesses  [1~3].  Leupeptin  anticancer  activity  is  poorly 
understood  and  even  less  is  known  about  its 
steroid  binding  inhibition  activity  [4,  5].  Leupeptin 
inhibits  serine  proteases  such  as  plasmin  and 
trypsin,  as  well  as  cysteine  proteases,  papain 
among  others  [6,  7].  A  few  years  ago,  Schroder  and 
co-workers  obtained  the  X-ray  crystal  structure  of 
the  leupeptin-papain  complex  at  2.1  A  resolution 
[8]  and,  very  recently,  Kurinov  and  Harrison  re¬ 
ported  X-ray  structures  for  two  different  crystal 
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forms  of  the  leupeptin-trypsin  complex  at  1.7  A 
resolution  [5]. 

It  is  a  great  concern  that  malaria,  a  disease 
responsible  for  millions  of  deaths  per  year  in  tropi¬ 
cal  areas  of  the  world,  is  consistently  spreading, 
the  main  cause  of  which  is  the  increasing  resis¬ 
tance  of  Plasmodium  falciparum  to  conventional 
drug  therapy  [9-13].  Proteases,  on  the  one  hand, 
are  critical  in  many  steps  of  the  biological  cycle  of 
parasites  and  this  makes  them  attractive  for  use  in 
the  development  of  new  drugs  [11,  14-18].  Cys¬ 
teine  proteases,  on  the  other  hand,  have  been  used 
in  last  years  as  promising  targets  to  new  therapies 
(see  [19]  and  references  therein).  Papain  from  Can¬ 
dida  papaya,  a  representative  of  a  superfamily  that 
is  predominant  in  eukariots,  the  active  center  of 
which  is  highly  conserved  [17],  is  attractive  for 
study  because  its  three-dimensional  structure  is 
well  known  [20-23].  The  catalytic  mechanism  of 
this  enzyme,  first  proposed  by  Drenth  and  co¬ 
workers  [20,  21],  has  been  the  object  of  extensive 
experimental  [24,  25]  and  theoretical  work  [26-35]. 

In  the  present  article,  we  discuss  are  AMI  re¬ 
sults  on  leupeptin  and  on  a  model  of  the  leu- 
peptin-papain  complex,  including  11  amino  acids 
relevant  to  the  active  center  of  the  enzyme.  Our 
aim  was  to  better  understand  some  of  the  details 
of  complex  formation,  having  in  mind  the  model¬ 
ing  of  new  antimalarials. 


Method  and  Computational  Details 

All  calculations  were  carried  out  at  the  SCF 
level  using  the  AMI  [36-38]  semiempirical  Hamil¬ 
tonian  within  the  MOP  AC  7.0  program  [39]  on 
IBM  RISC  6000  workstations  or  within  the 
UNICHEM  package  of  a  Cray  J90  (NACAD- 
COPPE-UFRJ).  The  coordinates  of  the  leupep- 
tin-papain  complex  [8]  were  obtained  from  the 
Protein  Data  Bank. 


Results  and  Discussion 

The  cysteine  proteases  are  a  group  of  proteolytic 
enzymes  whose  activity  depends  on  a  free  thiol 
group  of  a  cysteine  residue  [40],  The  best-known 
cysteine  proteases  are  obtained  from  plant  sources. 
Among  those,  papain  has  become  a  model  enzyme 
in  enzymology  [22,  23,  41]  and  in  structure-activ¬ 
ity  studies  [26-35].  We  briefly  described  the  main 


features  of  the  active  site  of  papain,  as  viewed  by 
X-ray  crystallography  and  enzymology  studies  (see 
[19]  and  references  therein;  for  an  extensive  dis¬ 
cussion  of  the  catalytic  mechanism  of  cysteine  pep¬ 
tidases,  see  [22]).  Unfortunately,  X-ray  structures 
of  proteins  are  often  obtained  either  in  an  inactive 
form,  as,  e.g.,  the  structure  of  papain  published  by 
Kamphuis  and  co-workers  (in  which  the  thiol 
group  of  Cys25  is  oxidized  to  SO3 )  [42],  or  in  an 
inhibited  form,  as  in  protein-inhibitor  complexes 
[5,  8].  On  the  other  hand,  theoretical  treatment  of 
the  problem  is  hindered  by  the  number  of  atoms 
that  must  be  included.  For  this  reason,  accurate 
methods  of  calculation  have  often  been  applied  to 
very  inaccurate  models  or,  at  the  other  extreme, 
force  fields  have  been  used  for  docking  purposes. 
In  this  work,  we  present  our  calculations  at  the 
AMI  level  of  both  the  active  center  of  papain  and 
a  papain-leupeptin  complex,  including  a  signifi¬ 
cant  number  of  amino  acids  relevant  to  the  prote¬ 
olytic  activity  of  the  enzyme.  We  have  not  sought 
to  include  solvent  molecules  as  yet,  and  therefore 
our  results  must  be  viewed  as  preliminary. 

LEUPEPTIN 

Leupeptin  (Fig.  1)  exists  in  three  covalent  forms 
in  water  at  room  temperature:  leupeptin  hydrate 
(42%),  a  cyclic  carbinolamine  (56%),  and  the  free 
aldehyde  (2%).  The  free  aldehyde  binds  to  papain 
with  a  binding  constant  equal  to  2.2  X  10" 11  M 
and  is  the  sole  active  form  of  leupeptin  [43].  In  the 
free  aldehyde  form,  leupeptin  is  a  very  flexible 
molecule.  We  calculated  AMI  partial  potential  en¬ 
ergy  surfaces  (PES)  for  21  torsional  angles  (Fig.  1), 
keeping  constant  at  180°  the  three  peptide  bonds 
only.  After  that,  starting  from  a  geometry  that 
retained  the  best  values  of  the  21  torsional  angles, 
we  fully  optimized  the  structure  (Fig.  2).  The  re¬ 
sults  for  this  last  geometry  are  shown  in  Table  I. 
Finally,  each  one  of  the  structures  generated  was 
superimposed  on  the  crystal  structure  of  leupeptin 
from  the  coordinates  available  in  the  Protein  Data 
Bank  [8].  The  root  mean-square  values  of  the  su¬ 
perimposed  structures  lie  between  1.95  and  2.09  A, 
except  for  the  fully  optimized  geometry  (Table  I), 
for  which  it  is  1.67  A.  Therefore,  our  best  calcu¬ 
lated  structure  (A  Hf=  -  65.8  kcal /mol)  has  a  close 
fit  to  the  crystal  structure  (Fig.  3,  Table  I).  In  an 
attempt  to  improve  the  fitting,  we  rotated  torsional 
angles  3,  7,  and  19  by  180°  and  fully  optimized  the 
structure  so  obtained.  The  resulting  structure  was 
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less  stable  than  our  calculated  structure  by  AH  = 
7.4  kcal/mol  (Table  I).  We  also  fully  optimized  the 
geometry  of  leupeptin  obtained  directly  from  the 
crystal  structure.  The  result  was  a  third  structure 
6  kcal/mol  less  stable  than  our  calculated  struc¬ 
ture.  These  results  seem  to  imply  that  in  the  crys¬ 
tal  form  of  the  leupeptin-papain  complex  the  host 
molecule  is  in  a  tensioned  conformation.  By  ther¬ 
modynamic  reasons,  formation  of  the  thiohemiac- 


FIGURE  2.  Optimized  (AMI)  leupeptin  geometry. 


etal  alone  cannot  explain  this  conformation  [44]; 
therefore,  stabilization  must  come  from  hydrogen 
bonds  and  other  polar  interactions,  either  from  the 
water  (and  methanol)  molecules  or  the  peptide 
backbone  [41]. 

LEUPEPTIN-PAPAIN  COMPLEX 

We  now  describe  our  model  of  the  active  site  of 
papain  and  of  the  papain-leupeptin  complex.  Due 
to  the  computational  effort  required,  we  limited 
the  model  to  leupeptin  and  11  amino  acids  rele¬ 
vant  to  the  active  center.  From  the  L-domain  of 
papain  (Fig.  4),  we  chose  to  include  the  following 
residues:  Glyl9,  which  contributes  to  the  stabiliza¬ 
tion  of  the  oxyanion  hole  of  the  enzyme  [45]  and 
forms  hydrogen  bonds  to  the  a-helix  [8];  the  triad 
Gly23,  Ser24,  Cys25,  which  is  the  section  of  the 
a-helix  involved  in  the  oxyanion  stabilization  and 
the  proteolytic  activity  [22].  We  hope  that  we  have 
included  most  of  the  electrostatic  contribution  of 
the  a-helix  to  the  stability  of  the  catalytic  triad  [26, 
30].  We  also  included  Gly66,  which  is  relevant  to 
the  binding  of  the  host  peptide  during  proteolysis 
[22],  and  Tyr67  and  Pro68,  two  amino  acids  from 
the  S2  subsite  (hydrophobic  pocket;  for  nomencla¬ 
ture  see  [46]).  From  the  R-domain  of  papain,  we 
included  Alal60,  which  is  relevant  for  future  stud¬ 
ies  concerning  the  development  of  antimalarials 
[19];  Aspl58,  which  has  been  thought  to  contribute 
to  the  stability  of  the  catalytic  triad  [23]  (see. 
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TABLE  I _ 


Summary  of  dihedral  angles  of  leupeptin. 

1 

AMl- 

optimized 

A  Hf=  -65.86 
(kcal/mol) 

ii 

Crystal 
structure 
(from  PDB  [8]) 

AH,  =  48.21 
(kcal  /  mol) 

III 

Modified 
(from  l)a 

A  Hf=  -58.45 
(kcal  /  mol) 

IV 

AMl- 
optimized 
(from  II) 

A  Hf=  -59.73 
(kcal  /  mol) 

1 

56.0 

60.7 

59.1 

63.7 

2 

161.8 

160.3 

152.7 

151.4 

3b 

114.4 

-38.4 

-111.5 

-105.1 

4 

160.8 

134.8 

171.7 

157.1 

5 

-37.9 

-33.8 

-28.6 

5.0 

6 

96.0 

118.0 

95.2 

106.4 

7b 

-17.4 

168.2 

-16.3 

-159.5 

8 

-69.5 

-90.1 

-71.6 

-120.1 

9 

96.4 

118.8 

95.2 

62.3 

10 

68.4 

68.9 

66.6 

63.7 

11 

-178.1 

-130.9 

-175.8 

-152.5 

12 

-4.5 

-0.2 

-8.0 

-8.2 

13 

6.6 

2.2 

15.9 

-3.3 

14 

-71.5 

-100.8 

-66.2 

-149.8 

15 

163.1 

178.7 

166.1 

-159.1 

16 

-179.6 

-180.0 

-178.1 

-180.1 

17 

179.1 

-180.0 

-180.0 

-178.1 

18 

-67.3 

-65.2 

-69.1 

-110.5 

19b 

-72.3 

-46.8 

-52.9 

-67.1 

20 

179.7 

178.0 

-168.8 

-177.8 

21 

-178.3 

179.0 

159.4 

171.7 

Starting  from  structure  I,  these  angles  have  been  changed  by  180°  and  the  new  geometry  was  fully  optimized  to  reproduce  the 


tions  of  the  active  site  obtained  from  the  crystal 
structure  (column  1)  and  the  AMl-optimized  ac¬ 
tive  site  (after  exclusion  of  leupeptin  and  water 
molecules;  column  2).  Most  of  the  differences  in 
the  backbone  dihedral  angles  must  be  ascribed  to 
the  energy  optimization  of  the  terminal  groups  of 
the  fragments.  Some  differences,  however,  are  rele¬ 
vant  to  the  mechanism  of  enzymatic  action.  Figure 
5  shows  that  exclusion  of  the  host  molecule  (and 
water)  leads  in  the  AMl-calculated  active  site  to 
deprotonation  of  the  imidazole  ring.  This  is  fol¬ 
lowed  by  a  torsion  of  the  — CH2SH  away  from  the 
imidazole  ring  of  Hisl59.  A  similar  torsion  of  the 
terminal  — NCH3  group  of  the  Cys25  fragment 
approaches  this  group  to  — CH2SH.  However,  it  is 
possible  that,  at  last  partially,  the  rotation  of  the 
— CH2SH  group  is  an  artifact  of  the  calculation. 
Another  potentially  relevant  difference  is  a  reori¬ 
entation  of  the  OH  lateral  chain  of  Ser24  toward 
— CH2SH,  a  residue  hitherto  not  implied  in  the 


crystal  structure  (II). 

however,  [22]);  and  Hisl59  and  Asnl75,  directly 
involved  in  the  proteolytic  activity  [22]. 

Table  II  shows  the  dihedral  angles  relevant  for 
our  models.  Small  differences  can  be  observed 
between  the  dihedral  angles  in  the  peptide  sec- 


FIGURE  3.  Superimposition  of  the  optimized  (AMI) 
and  the  X-ray  structure  geometries  of  leupeptin. 
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FIGURE  4.  A  model  of  the  active  site  of  papain  obtained  by  X-ray  crystallography  (PDB,  [8]). 


catalytic  action  of  papain.  In  the  R-domain,  one 
can  observe  a  torsion  of  the  plane  of  the  carboxyl 
group  of  the  lateral  chain  of  Aspl58.  Since  no 
similar  rotation  of  the  terminal  bond  of  the  Aspl58 
fragment  is  observed,  this  torsion  of  the  carboxyl 
group  (+30°  from  the  crystal  to  the  active  site) 
could  be  associated  with  the  mechanism  of  com¬ 
plex  formation.  Nevertheless,  as  pointed  out  by 
Dijkman  and  van  Duijnen  [30],  the  effect  of  the 
negative  charge  of  Aspl58  in  the  proton-transfer 


mechanism  can  be  nearly  completely  screened  by 
the  solvent.  A  second  important  observation  is  the 
torsion  of  the  lateral  chain  of  Hisl59  away  from 
the  S — H  residue  of  Cys25,  complementary  to  a 
similar  movement  of  the  — CH2SH  residue. 

To  optimize  the  leupeptin-papain  complex,  we 
initially  moved  the  sulfur  atom  from  the  original 
crystal  position.  This  leads  to  reconstruction  of  the 
— CHO  group  of  leupeptin.  Subsequent  AMI  opti¬ 
mization  (Fig.  6)  moved  the  sulfur  atom  back  to  a 
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TABLE  II _ 

Summary  of  the  relevant  dihedral  angles  of  the  amino  acid  fragments  used  in  the  model. 


Model  la  Model  llb 


Crystal 

structure 

AM1- 

optimized 

Crystal 

structure 

AMI- 

optimized 

Gln19 

N9  —  C7  —  C6  —  C5 

89.7 

66.7 

66.4 

C7  —  C6  —  C5  —  C2 

170.0 

148.7 

— 

164.6 

C6  —  C5  —  C2  —  N1 

—  57.1 

-71.2 

— 

-63.6 

C5  —  C2 — N1  — C84 

-166.1 

-138.2 

— 

-148.7 

C5  —  C2  —  C3  —  N91 

-104.6 

-107.2 

— 

178.0 

Gly23 

N18  —  Cl  5  —  Cl  4  —  N13 

-66.9 

-53.2 

-55.8 

Cl  5  —  Cl  4  —  N13  —  Cl  01 

78.1 

85.0 

— 

83.7 

Ser24 

N26  —  C20  —  Cl  9  —  C22 

-138.0 

174.2 

158.5 

C20  —  Cl  9  —  C22  —  023 

73.3 

67.3 

— 

71.3 

C22  —  Cl  9  —  C22  —  Cl  5 

-132.6 

-150.4 

— 

-115.0 

H25  —  023  —  C22  —  Cl  9 

174.2 

-72.3 

— 

-69.0 

Cys25 

S31  —  C30  — C27  — N26 

-63.5 

-44.1 

-80.3 

-54.2 

C30  —  C27  —  N26  —  C20 

-170.9 

-83.2 

179.9 

-129.3 

N1 75  —  C28  —  C27  —  C30 

-97.1 

-52.7 

38.0 

-68.0 

Gly66 

Cl  27  —  N33  —  C34  —  C35 

173.8 

128.7 

_ 

128.6 

N33  —  C34  —  C35  —  N 1 33 

161.6 

-163.1 

— 

-168.0 

Tyr67 

C35  —  N1 33  —  Cl  35  —  Cl  86 

-111.0 

-139.8 

_ 

-130.2 

N1 33  —  Cl  35  —  Cl  84  —  Cl  85 

52.7 

62.9 

— 

57.8 

Cl  86  —  Cl  35  —  Cl  84  —  Cl  35 

-60.8 

-52.9 

— 

-61.3 

Cl  92  —  Cl  87  —  Cl  86  —  Cl  35 

79.7 

72.5 

— 

64.4 

Pro68 

N1 1 9  —  C40  —  C39  —  N38 

39.2 

-59.9 

-65.6 

C40  —  C39  —  N38  —  Cl  84 

58.0 

88.8 

— 

87.0 

Aspl 58 

052  — C50  — C49  — C46 

136.6 

-158.6 

_ 

-127.2 

C50  —  C49  —  C46  —  N45 

-55.0 

-54.7 

— 

-64.2 

C49  —  C46  —  N45  —  Cl  54 

-93.0 

-136.6 

— 

-134.9 

His159 

N60  —  C59  —  C58  —  C55 

-57.0 

-49.5 

_ 

-13.7 

C59  —  C58  —  C55  —  N54 

178.0 

169.5 

— 

-168.9 

C58  —  C55  —  N54  —  C47 

-88.6 

-153.4 

— 

-102.6 

C55  —  N54  —  C47  —  C46 

166.1 

178.8 

— 

179.6 

N54  —  C47  —  C46  —  N45 

-.1 

54.0 

— 

-10.6 

048  —  C47  —  C46  —  N45 

-177.9 

-125.0 

— 

171.7 

Alai  60 

C71  —  C68  — N67  — C56 

-127.2 

-156.2 

-140.6 

C68  —  N67  —  C56  —  C55 

178.8 

179.5 

— 

-175.7 

N67  —  C56  —  C55  —  N54 

-136.4 

-77.3 

— 

-128.4 

057  —  C56  —  C55  —  N54 

41.0 

103.8 

— 

31.8 
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TABLE  II 
Continued. 


Crystal 

structure 

Model  la 

AM1- 

optimized 

Model  llb 
Crystal 
structure 

AM1- 

optimized 

Asn175 

N80  —  C78  —  C77  —  C74 

162.5 

169.9 

— 

164.1 

C78  —  C77  —  C74  —  N73 

169.8 

158.1 

— 

162.5 

N1 45  —  C75  —  C74  —  N73 

-157.2 

-150.4 

— 

-147.6 

Cl  39  —  N73  —  C74  —  C77 

-124.1 

-75.4 

— 

-108.9 

aModel  I  — Crystal  structure  excluding  leupeptin  and  solvent  molecules  (from  PDB  [8]). 
bModel  II  — crystal  structure  including  leupeptin  (from  PDB  [8]). 


FIGURE  5.  An  optimized  (AMI)  model  of  the  active  site  of  papain. 
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position  very  close  to  the  original  position  without 
proton  transfer  from  the  imidazole  ring  to  the 
sulfur  atom.  The  imidazole  ring,  however,  rotates 
by  +  40°  (from  the  crystal  structure)  in  relation  to 
the  C — S  bond  axis.  More  important,  the  ImH+ 
group  is  displaced  by  0.7  A,  away  from  the  car¬ 
boxyl  group  of  Aspl58,  and  at  the  same  time,  the 
other  N — H — H  group  of  the  imidazole  ring  moves 
closer  (1.2  A)  to  Asnl75.  These  results  seem  to 
imply  that  the  host  molecule  is  essential  for  proton 
transfer  from  S — H  to  imidazole.  On  the  other 


hand,  protonation  of  imidazole  increases  the 
strength  of  the  N — H  (imidazole)  •••  0  =  C(Asnl 75) 
interaction,  which  is  in  line  with  earlier  findings 
[44].  These  results,  and  the  fact  that  proton  transfer 
from  imidazole  to  sulfur  occurs  in  the  absence  of 
leupeptin,  seem  to  point  either  to  a  concerted 
mechanism  or  to  proton  transfer  in  a  posterior  step 
of  complex  formation. 

Table  III  shows  relevant  distances  of  the  com¬ 
plex  leupeptin-papain  (crystal  and  AMl-opti- 
mized  structures)  and  compares  them  with  a  re- 


/ 


FIGURE  6.  An  optimized  (AMI)  model  of  the  active  site  of  the  complex  papain-leupeptin. 


1132 


VOL.  65,  NO.  6 


TABLE  III  _ _ 

Summary  of  papain  -  leupeptin  interactions. 
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Leupeptin 

residue 

Leupeptin 

atom 

Amino  acid 

Amino  acid 
atom 

R,  y  crystal3 

A  (°) 

R,  y  AMI 

A  (°) 

[32] b 
(A) 

[47] c 
(A) 

PI  (Arg)d 

Oxyanion  O 

Gln19 

Amide  N  of  side 
chain 

2.88e  (9.7)e 

2.95  (32.9) 

3.10 

2.87 

P1(Arg) 

Oxyanion  O 

Aspl 58 

Carbonyl  O  of 
backbone 

3.54  (21 .0) 

3.17(18.8) 

— 

— 

P2(Leu) 

Carbonyl  O  of 
backbone 

Gly66 

Amide  N  of 
backbone 

2.78  (14.9) 

3.04  (22.0) 

— 

3.04 

P2(Leu) 

Amide  N  of 
backbone 

Gly66 

Carbonyl  O  of 
backbone 

3.05  (10.9) 

3.27(18.9) 

— 

— 

Coordinates  obtained  from  Protein  Data  Base  (DPB)  (see  [8]). 
Correspondent  atoms  in  the  calculated  active  centers  (estimated  values). 
cE-64  -papain  complex  crystal  structure. 
dFor  nomenclature,  see  [22]. 

eR  and  y  are  the  H-bond  parameters  as  defined  by  Schuster  [48]. 


cently  calculated  model  [32]  and  with  the  crystal 
structure  of  an  E-64-papain  complex  [49].  AMI  is 
known  to  underestimate  hydrogen-bonding  forma¬ 
tion,  consistently  giving  distances  and  angles  larger 
than  the  experimental  values  [36].  The  calculated 
AMI  values  for  the  leupeptin-papain  complex  re¬ 
flects  this  fact.  The  exception  is  the  NH  •  •  •  O  =  C 
(backbone  of  leupeptin  to  backbone  of  papain  in¬ 
teraction).  In  this  case,  complex  formation  in¬ 
creases  the  hydrogen-bond  strength.  This  can  be 
understood  on  the  grounds  of  steric  interactions. 


Concluding  Remarks 

In  this  article,  we  described  our  modeling  of  the 
protease  inhibitor  leupeptin  and  its  interactions 
with  the  proteolytic  enzyme  papain.  The  results  of 
our  calculations  show  that  the  geometry  of  leu¬ 
peptin  in  the  crystal  is  less  stable  than  our  best 
calculated  geometry  by  about  6.0  kcal/mol.  Stabi¬ 
lization  of  the  more  tensioned  structure  must  come 
from  additional  interactions  between  the  host 
molecule  and  the  peptide  backbone  lateral  chains 
or  solvent  molecules  in  the  active  center. 

Our  calculations  in  the  active  center  of  papain 
show  that  proton  transfer  to  the  thiol  group  is 
preferred,  according  to  recent  calculations  at  the 
SCF/6-31G*  level  by  Beveridge  (models  7  and  8  of 
[33]).  Our  calculations  show  also  a  torsion  of  the 
carboxyl  group  (  +  30°  from  the  crystal)  that  could 
be  involved  in  the  mechanism  of  complex  forma¬ 
tion.  This  interaction,  however,  could  be  screened 


by  solvent  molecules  [30].  The  calculations  also 
show  a  displacement  of  the  imidazole  ring  of 
Hisl59  away  from  the  S — H  residue  of  Cys25. 

The  calculations  on  the  leupeptin-papain 
complex  show  that  reconstruction  of  the  alde- 
hyde-CHO  bond  of  leupeptin  led  to  a 
leupeptin-papain  complex  in  which  the  S“ 

•  •  *  Him  +  *  ■  •  O  =  C(Asnl 75)  catalytic  triad  is 
strengthened.  Moreover,  comparison  of  some  rele¬ 
vant  interactions  between  leupeptin  and  papain 
show  that  AMI  calculations  reproduce  fairly  well 
experimental  hydrogen-bond  distances  and  angles. 

ACKNOWLEDGMENTS 

This  research  received  partial  financial  support 
from  Brazilian  agencies  CNPq  and  FAPERJ.  We 
are  grateful  to  the  Nucleo  de  Atendimento  em 
Computa^ao  de  Alto  Desempenho  (NACAD- 
COPPE-UFRJ)  for  the  grant  of  computational  time 
in  their  Cray  J90.  The  authors  also  thank  Prof. 
Michael  C.  Zerner  for  the  use  of  the  computational 
facilities  at  the  Quantum  Theory  Project  (Florida). 


References 

1.  A.  R.  Kennedy,  Cancer  Res.  54(7)  (Suppl.),  1999s  (1994). 

2.  G.  Leto,  F.  M.  Tumminello,  N.  Gebbia,  B.  Woynarowska, 
and  R.  J.  Bernacki,  Anticancer  Res.  10(1),  265  (1990). 

3.  J.  Brtko,  J.  Knopp,  and  M.  E.  Baker,  Mol.  Cell.  Endocrinol. 
93,  81  (1993). 

4.  I.  Eto  and  C.  J.  Grubbs,  Biochem.  J.  283,  209  (1992). 

5.  I.  V.  Kurinov  and  R.  W.  Harrison,  Prot.  Sci.  5,  752  (1996). 


INTERNATIONAL  JOURNAL  OF  QUANTUM  CHEMISTRY 


1133 


BARREIRO,  BICCA  DE  ALENCASTRO,  AND  DA  MOTTA  NETO 


6.  H.  Umezawa,  Acta  Biol.  Med.  Ger.  36(11-12),  1899  (1977). 

7.  H.  Umezawa,  Annu.  Rev.  Microbiol.  36,  75  (1982). 

8.  E.  Schroder,  C.  Phillips,  E.  Garman,  K.  Harlos,  and  C. 
Crawford,  FEBS  Lett.  315(1),  38  (1993). 

9.  D.  E.  Hudson-Taylor,  S.  A.  Dolan,  F.  W.  Klotz,  H.  Fujioka, 
M.  Aikawa,  E.  V.  Koonin,  and  L.  H.  Miller,  Mol.  Microbiol. 
15,  463  (1995). 

10.  J.  J.  Marr  and  M.  Muller,  Eds.,  Biochemistry  and  Molecular 
Biology  of  Parasites  (Academic  Press,  London,  1995). 

11.  B.  C.  Elford,  G.  M.  Cowan,  and  D.  J.  P.  Ferguson,  Biochem. 
J.  308,  361  (1995). 

12.  T.  E.  Wellems,  Parasit.  Today  7,  110  (1991),. 

13.  W.  FL  Wensdorfer  and  D.  Payne,  Pharm.  Therap.  50,  95 
(1991). 

14.  I.  Ansorge,  D.  Jeckel,  F.  Wieland,  and  K.  Lingelbach, 
Biochem.  J.  308,  335  (1995). 

15.  J.  H.  McKerrow,  Exp.  Parasitol.  68,  111  (1989). 

16.  J.  D.  Lonsdale-Eccles,  G.  W.  N.  Mpinbaza,  Z.  R.  M.  Nkhun- 
gulu,  J.  Olobo,  L.  Smith,  O.  M.  Tosomba,  and  D.  J.  Grab, 
Biochem.  J.  305,  549  (1995). 

17.  J.  S.  Bond  and  E.  Butler,  Annu.  Rev.  Biochem.  56,  333  (1987). 

18.  J.  H.  McKerrow,  E.  Sun,  P.  J.  Rosenthal,  and  J.  Bouvier, 
Annu.  Rev.  Microbiol.  47,  821  (1993). 

19.  C.  M.  R.  de  Sant' Anna,  R.  Bicca  de  Alencastro,  C.  R.  Ro¬ 
drigues,  G.  Barreiro,  E.  Barreiro,  J.  D.  da  Motta  Neto,  and  A. 
C.  C.  Freitas,  Int.  J.  Quantum  Chem.  Quantum  Biol.  Symp. 
23,  111  (1996). 

20.  J.  Drenth,  K.  H.  Kalk,  and  H.  M.  Swen,  Biochemistry  15, 
3731  (1976). 

21.  I.  G.  Kamphuis,  K.  H.  Kalk,  M.  B.  A.  Swarte,  and  J.  Drenth, 
J.  Mol.  Biol.  179,  233  (1984). 

22.  A.  C.  Storer  and  R.  Menard,  Methods  Enzymol.  244,  486 
(1994). 

23.  E.  N.  Baker  and  f.  Drenth,  in  Biological  Macromolecules  and 
Assemblies ,  Vol.  3:  Active  Sites  of  Enzymes,  F.  A.  Jurnak  and 
A.  McPherson  Eds.  (Wiley,  New  York,  1987),  pp.  313-368. 

24.  V.  Turk  and  N.  Bode,  in  Innovations  in  Proteases  and  Their 
Inhibitors,  F.  X.  Aviles,  Ed.,  (W.  de  Gruyter,  New  York, 
1993),  pp.  161-178. 

25.  W.  Bode  and  R.  Huber,  in  Innovations  in  Proteases  and  Their 
Inhibitors,  F.  X.  Aviles,  Ed.,  (W.  de  Gruyter,  New  York, 
1993),  pp.  81-122. 

26.  R.  Lavery,  A.  Pullman,  and  Y.  K.  Wen,  Int.  J.  Quantum 
Chem.  24(4),  353  (1983). 

27.  P.  Th.  van  Duijnen,  B.  Th.  Thole,  R.  Broer,  and  W.  C. 
Nieuwpoort,  Int.  J.  Quantum  Chem.  17,  651  (1980). 


28.  J.  A.  C.  Rullmann,  M.  N.  Bellido,  and  P.  Th.  van  Duijnen, 
J.  Mol.  Biol.  206,  101  (1989). 

29.  D.  Arad,  R.  Langridge,  and  P.  A.  Kollman,  J.  Am.  Chem. 
Soc.  112,  491  (1990). 

30.  J.  P.  Dijkman  and  P.  Th.  van  Duijnen,  Int.  J.  Quantum 
Chem.  Quantum  Biol.  Symp.  18,  49  (1991). 

31.  G.  D.  Duncan,  C.  P.  Huber,  and  W.  J.  Welsh,  J.  Am.  Chem. 
Soc.  114,  5784  (1992). 

32.  M.  J.  Harrison,  N.  A.  Burton,  I.  H.  Hillier,  and  I.  R.  Gould, 
Chem.  Commun.  2769  (1996). 

33.  A.  J.  Beveridge,  Prot.  Sci.  5,  1355  (1996). 

34.  Y.  Lin  and  W.  J.  Welsh,  J.  Mol.  Graph.  14,  62  (1996). 

35.  N.  Swamy  Kandadai  and  M.  Rami  Reddy,  J.  Comput. 
Chem.  17,  1328  (1996). 

36.  M.  J.  S.  Dewar,  E.  G.  Zoebisch,  E.  F.  Healy,  and  J.  J.  P. 
Stewart,  J.  Am.  Chem.  Soc.  107(13),  3902  (1985). 

37.  M.  J.  S.  Dewar  and  E.  G.  Zoebisch,  J.  Mol.  Struct.  (Theo- 
chem)  180,  1  (1988). 

38.  A.  A.  Voityuk,  J.  Struct.  Chem.  29(1),  120  (1988). 

39.  J.  J.  P.  Stewart,  MOP  AC  7.0  (Frank  J.  Seiler  Research  Lab., 
US  Air  Force  Academy,  Colorado  Springs,  1993). 

40.  A.  N.  Glazer  and  E.  L.  Smith,  in  The  Enzymes,  3rd  ed.,  P.  D. 
Boyder,  Ed.  (Academic  Press,  New  York,  1971),  Vol.  3. 

41.  P.  J.  Berti,  C.  H.  Faerman,  and  A.  C.  Storer,  Biochemistry  30, 
1394  (1991). 

42.  I.  G.  Kamphuis,  K.  H.  Kalk,  M.  B.  A.  Swarte,  and  J.  Drenth, 
J.  Mol.  Biol.  179,  233  (1984). 

43.  R.  M.  Schultz,  P.  Varma-Nelson,  R.  Ortiz,  K.  A.  Kozlowski, 
A.  T.  Orawski,  P.  Pagast,  and  A.  Frankfater,  J.  Mol.  Biol. 
264(3),  1497  (1989). 

44.  S.  L.  Bearne  and  R.  Wolfenden,  CHEMTRACTS-Org.  Chem. 
8,  288  (1995). 

45.  R.  Menard,  J.  Carriere,  P.  Laflamme,  C.  Plouffe,  H.  E. 
Khouri,  T.  Vernet,  D.  C.  Tessier,  D.  Y.  Thomas,  and  A.  C. 
Storer,  Biochemistry  30,  8924  (1991). 

46.  A.  Berger  and  I.  Schechter,  Philos.  Trans.  R.  Soc.  Lond.  B 
257,  249  (1970). 

47.  E.  N.  Baker  and  J.  Drenth,  in  Biological  Macromolecules  and 
Assemblies,  Vol.  3:  Active  Sites  of  Enzymes,  F.  A.  Jurnak  and 
A.  McPherson,  Eds.  (Wiley,  New  York,  1987),  pp.  313-368. 

48.  P.  Schuster,  in  The  Hydrogen  Bond — Recent  Developments  in 
Theory  and  Experiments,  P.  Schuster,  G.  Zundel,  and  C. 
Sandorfy,  Eds.  (North-Holland,  Amsterdam,  1976),  pp. 
25-163. 

49.  K.  I.  Varughese,  F.  R.  Ahmed,  P.  R.  Carey,  S.  Hasnain,  C.  P. 
Huber,  and  A.  C.  Storer,  Biochemistry  28,  1330  (1989). 


1134 


VOL  65,  NO.  6 


Theoretical  Studies  of  Inclusion 
Complexes  of  a-  and  j8-Cyclodextrin  with 
Benzoic  Acid  and  Phenol 


MING-JU  HUANG,1  JOHN  D.  WATTS,2  NICHOLAS  BODOR1 

1  Center  for  Drug  Discovery,  College  of  Pharmacy,  P.O.  Box  100497,  Health  Science  Center,  Gainesville, 
Florida  32610-0497 

2Quantum  Theory  Project,  P.O.  Box  118435,  362  Williamson  Hall,  University  of  Florida,  Gainesville, 
Florida  32611-8435 

Received  2  June  1997;  accepted  19  August  1997 


ABSTRACT:  A  series  of  semiempirical  molecular  orbital  calculations  using  the  AMI 
method  were  performed  on  the  inclusion  complexes  of  a -  and  /3-cyclodextrin  with 
benzoic  acid  and  phenol  in  the  "head-first"  and  "tail-first"  positions.  The  AMI  results 
show  that  a-cyclodextrin  complexes  with  both  guest  compounds  in  the  "head  first" 
position  are  more  stable  than  in  the  "tail-first"  position,  while  the  /3-cyclodextrin 
complex  with  phenol  in  the  "tail-first"  position  is  more  stable,  but  with  benzoic  acid,  the 
"head-first"  position  is  more  stable.  The  driving  forces  for  complex  formation  were 
investigated  based  on  different  intramolecular  and  intermolecular  interactions.  In 
addition,  1SCF  AMI  calculations  were  performed  on  the  /3-cyclodextrin  complexes  with 
benzoic  acid  in  the  "tail-first"  and  "head-first"  positions  with  the  benzoic  acid  moved 
stepwise  along  the  Z-axis  of  the  /3-cyclodextrin  principal  axis  coordinate  system.  ©  1997 
John  Wiley  &  Sons,  Inc.  Int  J  Quant  Chem  65:  1135-1152,  1997 


Introduction 

Cyclodextrins  (CDs)  were  first  isolated  in  1891 
by  Villiers  [1]  as  degradation  products  of 
starch,  Schardinger  characterized  them  as  cyclic 
oligosaccharides  in  1904  [2],  then  in  1935,  Freuden- 
berg  and  Jacobi  [3]  described  them  as  being  macro- 
cyclic  compounds,  built  from  glucopyranose  units 

Correspondence  to:  N.  Bodor. 


linked  by  a-(l,4)-glycosidic  bonds.  The  most  com¬ 
mon  CDs  are  a-CD,  /3-CD,  and  y-CD,  consisting  of 
six,  seven,  and  eight  a-D-glucopyranosyl  residues, 
respectively.  The  CD  molecules  have  an  en- 
dolipophilic  cavity,  which  is  made  water-soluble 
by  many  outward-pointing  OH  groups.  This  pres¬ 
ence  of  a  hydrophobic  central  cavity  enables  many 
different  (organic,  inorganic,  neutral,  and  ionic) 
molecules  to  be  incorporated  into  the  cavity,  both 
in  the  solid  state  and  in  solution.  Furthermore, 
complex  formation  of  CDs  with  guest  compounds 
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such  as  drugs  and  insecticides  causes  new  physi¬ 
cochemical  features,  leading  to  practical  usages  in 
pharmaceutical  chemistry,  food  technology,  and 
other  industrial  areas  [4,  5]. 

The  considerable  power  of  complexation  of  the 
CDs  was  first  recognized  by  Freudenberg  and 
Cramer  [6]  in  the  late  1940s,  substantiated  by  com¬ 
plexation  studies  by  Cramer,  Saenger,  and  others 
[4,  7].  Evidence  that  the  incorporated  guest  is  sit¬ 
ting  at  the  center  of  the  cavity  was  obtained  by 
determining  the  association  constant  for  the  sol¬ 
vated  complex  and  confirmed  by  X-ray  crystallog¬ 
raphy.  Since  then,  many  methods  have  been  used 
to  study  CDs  and  their  inclusion  complexes,  in¬ 
cluding  X-ray  crystallography  [8]  for  the  solid-state 
crystallographic  analyses,  NMR  spectroscopy  [9] 
for  the  solution-phase  spectroscopic  studies  which 
are  usually  performed  in  an  aqueous  environment 
or  involve  polar  organic  solvents,  ESR  spec¬ 
troscopy  [10],  and  electrochemical  methods  [11]. 

There  are  no  covalent  bonds  formed  or  broken 
during  the  complex  formation  process  and  the 
complexed  molecules  are  in  equilibrium  with  un- 
complexed  molecules  in  aqueous  solutions.  The 
driving  forces  for  the  complex  formation  have  been 
attributed  to  the  release  of  entropy-rich  water 
molecules  from  the  cavity  [12],  van  der  Waals 
interactions  [13,  14],  hydrogen  bonding  [15-17], 
hydrophobic  interactions  [14,  18],  release  of  ring 
strain  in  the  cyclodextrin  molecule  [16],  and 
changes  in  solvent-surface  tension  [19].  However, 
the  relative  contributions  and  even  the  nature  of 
the  different  forces  are  not  well  known.  The  ther¬ 
modynamic  parameters  enthalpy  (AH)  and  en¬ 
tropy  (AS)  can  be  obtained  from  the  temperature 
dependence  of  the  stability  constant  of  the  CD 
complex.  AH  is  always  negative  but  AS  can  be 
positive  or  negative  [4].  The  central  cavity  of  the 
CD  molecule  is  relatively  hydrophobic  but  CDs 
are  able  to  form  inclusion  complexes  with  a  wide 
variety  of  compounds  ranging  from  very  polar 
inorganic  ions  to  completely  nonpolar  organic 
molecules.  The  hydrophobicity  alone  cannot  fully 
explain  the  complexation.  CD  complexes  are  usu¬ 
ally  studied  by  NMR  spectroscopy,  but  the  kinet¬ 
ics  of  complex  formation  introduce  some  difficulty 
in  the  extraction  of  conclusions  from  these  experi¬ 
ments  [20].  We  studied  the  inclusion  of  a  series  of 
phenol  and  benzoic  acid  in  a-CD  and  /3-CD  using 
the  AMI  semiempirical  molecular  orbital  method 
[21].  We  investigated  the  driving  forces  of  complex 
formation  based  on  a  combination  of  several  inter- 
molecular  interactions  and  possibly  hydrophobic 


effects  such  as  steric  fit  or  size  selectivity  (this 
would  be  the  primary  criterion),  van  der  Waals 
interactions,  hydrogen  bonding,  dispersive  forces, 
dipole-dipole  interactions,  charge-transfer  interac¬ 
tions,  and  electrostatic  interactions. 

Recently,  there  have  been  several  theoretical 
studies  of  CDs  and  CD  inclusion  complexes.  A 
molecular  modeling  study  of  structural  effects  on 
the  binding  of  amine  drugs  with  the  diphenyl- 
methyl  functionality  to  CDs  was  reported  by  Tong 
et  al.  [22].  A  conformational  study  of  /3-CD  com¬ 
plexes  with  nootropic  drugs  using  molecular  me¬ 
chanics  (MM)  was  performed  by  Amato  et  al.  [23]. 
Conformational  analysis  of  j3-CD  complexes  with 
a  variety  of  molecules  using  the  MM2  force  field 
was  performed  by  Kostense  et  al.  [24].  Some  other 
molecular  mechanical  studies  of  CDs  have  also 
been  reported  [25-35].  The  conformational  behav¬ 
ior  of  complexes  of  a-CD  with  p-chlorophenol  and 
p-hydroxybenzoic  acid  in  water  was  studied  using 
molecular  dynamics  simulations  by  van  Helden  et 
al.  [36].  In  addition,  several  molecular  dynamics 
studies  of  CDs  were  reported  [37-42].  A  series  of 
fixed-geometry  quantum  chemical  studies  of  CDs 
were  performed  with  the  semiempirical  CNDO  or 
CNDO/2  methods  [43-46].  We  recently  performed 
a  series  of  semiempirical  calculations  on  a-  and 
j3-CDs  using  the  AMI  method  [21],  including  full 
geometry  optimization  [47].  We  now  report  a  se¬ 
ries  of  AMI  calculations  on  complexes  with  ben¬ 
zoic  acid  and  phenol.  In  contrast  to  the  CNDO 
studies,  these  calculations  include  complete  and 
unrestricted  geometry  optimization  and  conforma¬ 
tional  analysis.  The  aim  of  this  study  was  to  com¬ 
pare  the  stabilization  energy  and  to  investigate  the 
driving  force  for  the  inclusion  of  phenol  and  ben¬ 
zoic  acid  in  a-CD  and  /3-CD. 


Methods 

AMI  calculations  on  the  inclusion  complexes  of 
phenol  and  benzoic  acid  with  a-CD  and  /3-CD  and 
on  the  free  CDs  were  performed  using  a  modified 
version  of  the  AMP  AC  program  from  QCPE  [21]. 
Phenol  and  benzoic  acid  were  studied  using  the 
MM2  and  MOP  AC  programs  on  the  Tektronix 
CAChe  (Computer  Assisted  Chemistry)  worksta¬ 
tion.  MM2  calculations  were  run  on  the  CAChe 
workstation  to  determine  starting  geometries  for 
the  AMI  calculations.  The  size  and  conformation- 
ally  flexibility  of  the  molecules  of  the  present  study 
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make  it  impossible  to  study  all  possible  conform- 
ers.  To  find  the  lowest-energy  structures,  we  tried 
several  starting  points  for  the  AMI  optimizations, 
including  some  determined  by  molecular  dynam¬ 
ics  simulations. 


Results  and  Discussion 

Calculated  partition  coefficients,  volume,  sur¬ 
face  area,  ovality,  and  lengths  of  the  guest  com¬ 
pounds,  benzoic  acid  and  phenol,  are  given  in 
Table  I.  The  geometrical  data  were  determined 
from  the  AMl-optimized  geometries  and  van  der 
Waals  radii.  The  partition  coefficients  were  calcu¬ 
lated  by  the  BLOGP  method  [48]  and  depend  on 
geometrical  and  electronic  parameters  (such  as 
atomic  charges  and  dipole  moment).  Table  II  con¬ 
tains  further  molecular  dimensions  of  benzoic  acid 
and  phenol. 

Table  III  contains  enthalpies  of  formation  (A  H^), 
dipole  moments,  HOMO  and  LUMO  energies  of 
phenol,  benzoic  acid,  a-CD,  and  /3-CD,  and  inclu¬ 
sion  complexes  of  phenol  and  benzoic  acid  with 
a-CD  and  /3-CD.  All  data  refer  to  AMl-optimized 
geometries*  Enthalpies  of  complexation  are  also 
given.  These  are  defined  by 

AAH;  =  AH/CD  +  guest) 

-  AH/CD)  -  A (guest). 

Thus,  a  negative  value  for  A  A  indicates  that 
complexation  is  thermodynamically  favorable.  The 
"head-first"  and  "tail-first"  orientations  are  de¬ 
picted  in  Figure  1. 

Table  III  shows  that  stable  inclusion  complexes 
of  both  guests  with  both  a-  and  /3-CD  can  be 

*The  A  Hjr  values  for  free  a-  and  /3-CD  are  slightly  lower  (2 
and  9  kcal/mol)  than  those  obtained  in  our  previous  study  of 
free  cyclodextrins.  Apparently,  the  structures  found  previously 
were  only  local  minima  and  not  global  minima. 


formed.  Molecular-size  considerations  show  that 
formation  of  inclusion  complexes  of  benzoic  acid 
and  phenol  is  plausible.  For  example.  Van 
Hooidonk  and  Breebaart-Hansen  [50]  estimated  the 

o 

diameter  of  the  /3-CD  cavity  to  be  7.5  A,  which  is 
to  be  compared  with  the  widths  of  benzoic  acid 
and  phenol  that  may  be  deduced  from  the  intemu- 
clear  distances  in  Table  II.  Orientation  other  than 
"head-first"  or  "tail-first,"  i.e.,  "crosswise"  are 
less  likely  from  size  considerations,  and  no  evi¬ 
dence  for  these  was  found  in  the  calculations. 
Experimentally,  Siimer  et  al.  [51,  52]  showed  that 
both  a-CD  and  /3-CD  form  stable  1 : 1  complexes 
with  benzoic  acid.  Interestingly,  y-CD  forms  unsta¬ 
ble  1 : 1  and  1 : 2  inclusion  complexes. 

Looking  at  the  enthalpies  of  complexation  in 
Table  III,  we  see  a  number  of  trends:  (1)  the  hy- 


TABLE  II _ 

Molecular  dimensions  of  phenol  and  benzoic  acid. 


10  10 
H  H 


Phenol 

Benzoic  acid 

R(8— 1 1 ) 

4.997 

4.994 

R(9-12) 

5.003 

5.001 

R(1 0-13) 

5.662 

R(8-12) 

4.336 

4.327 

R(9-11) 

4.322 

4.324 

R(1 3—1 4) 

2.210 

R(10-15) 

7.012 

TABLE  I _ _ 

Calculated  partition  coefficients,  volume  (A3),  surface  area  (A2),  ovality,  and  height  (A)  of  phenol 


and  benzoic  acid. 

Compound 

Log  P 

Volume 

Surface 

Ovality 

Height 

(a)  Phenol 

1.299 

91.84 

120.50 

1.224 

5.662 

(b)  Benzoic  acid 

1.610 

110.74 

143.02 

1.282 

7.012 
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TABLE  III _ 

Heats  of  formation  (A Hf,  kcal/mol),  dipole  moment  (Debye),  HOMO  energy  (eV),  and  LUMO  energy  (eV) 
for  AMI -optimized  geometries  of  benzoic  acid,  phenol,  a-CD,  p-CD,  and  complexes  of  a-  and  p-CD  with 
benzoic  acid  and  phenol  in  the  “head-first”  and  “tail-first”  positions;  stabilization  energies  (A  A Hf) 
are  in  kcal  /  mol. 


Compound 

A  Hf 

Dipole 

HOMO 

LUMO 

AA  H, 

Benzoic  acid 

—  68.0a 

2.4 

-10.08 

-0.47 

Phenol 

—  22.2a 

1.2 

-9.11 

0.40 

a-CD 

-1416.0 

4.0 

-10.48 

1.58 

p-CD 

-1656.6 

8.0 

-10.34 

1.26 

a-CD  +  benzoic  acid  “head  first” 

-1489.1 

4.7 

-10.21 

-0.69 

-5.1 

a-CD  +  benzoic  acid  “tail  first” 

-1482.0 

5.1 

-10.26 

-0.88 

2.0 

p-CD  -I-  benzoic  acid  “head  first” 

-1725.0 

6.6 

-10.37 

-0.82 

-0.4 

p- CD  +  benzoic  acid  “tail  first” 

-1716.9 

8.3 

-10.30 

-0.83 

7.7 

a-CD  +  phenol  “head  first” 

-1439.4 

6.6 

-9.28 

0.26 

-1.2 

a-CD  +  phenol  “tail  first” 

-1434.2 

8.9 

-9.32 

0.13 

4.0 

p- CD  +  phenol  “headfirst” 

-1674.8 

2.4 

-9.33 

0.18 

4.0 

p-CD  +  phenol  “tail  first” 

-1679.2 

7.5 

-9.49 

0.09 

-0.4 

Experimental  A H{  for  phenol  and  benzoic  acid  are  -23.0  and  -70.3  kcal/mol  [49]. 


droxyl  group  of  phenol  and  the  carboxylate  group 
of  benzoic  acid  prefer  to  face  the  primary  6-hy¬ 
droxyl  oxygen  (“head  first")  than  to  face  the  sec¬ 
ondary  2-  or  3-hydroxyl  oxygen  ("tail  first")  in  the 
a-CD  inclusion  complexes;  (2)  the  hydroxyl  group 
of  phenol  prefers  to  face  the  secondary  2-  or  3-hy¬ 
droxyl  oxygen  ("tail  first")  in  the  /3-CD  inclusion 
complexes,  but  the  carboxylate  group  of  benzoic 
acid  prefers  to  face  the  primary  6-hydroxyl  oxygen 
("head  first");  and  (3)  phenol  and  benzoic  acid 
form  more  stable  inclusion  complexes  with  a-CD 
than  with  /3-CD. 

For  a-CD  and  benzoic  acid,  the  arrangement  in 
which  the  carboxylate  group  of  benzoic  acid  faces 


the  primary  6-hydroxyl  oxygen  ("head  first")  has 
the  most  stabilization  energy  (5.1  kcal/mol).  This 
orientational  preference  is  in  agreement  with  pre¬ 
vious  CNDO  or  CNDO/2  studies  [43-46]  for  which 
it  was  indicated  that  the  dipole  moments  of  guest 
molecules  are  antiparallel  to  the  dipole  moment  of 
host  a-CD  in  the  crystalline  state. 

Of  the  four  inclusion  complexes  studied,  three 
prefer  the  "head-first"  orientation  and  just  one, 
the  j3-CD  and  phenol  complex,  prefers  the  "tail- 
first"  orientation.  The  preference  of  the  "tail-first" 
orientation  for  the  p-CD  and  phenol  complex  has 
also  been  suggested  by  others  [53-54].  The  prefer¬ 
ence  of  the  "head-first"  orientation  is  in  line  with 


FIGURE  1 .  The  two  possible  penetration  pathways  for  benzoic  acid  and  phenol. 
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dipole-dipole  interactions.  That  is  not  to  say,  how¬ 
ever,  that  the  guest-host  interaction  can  be  well 
modeled  by  treating  the  two  molecules  as  point 
dipoles;  specific  intermolecular  interactions  and 
subtle  geometry  changes  are  likely  to  determine 
the  stability  order.  One  approach  is  to  consider  the 
numbers  of  intermolecular  and  intramolecular  hy¬ 
drogen  bonds.  A  hydrogen  bond  is  defined  as  an 
O — H---0  interaction  in  which  the  H---0  dis- 

o 

tance  is  less  than  or  equal  to  3.00  A  and  the  angle 
at  H  is  larger  than  90°.  The  problems  of  such  cutoff 
criteria  are  discussed  in  greater  detail  elsewhere 
[55-58].  The  detailed  intramolecular  hydrogen 
bonds  between  the  2-hydroxyl  and  3-hydroxyl 
groups  of  adjacent  glucose  units  and  intramolecu¬ 
lar  hydrogen  bonds  between  the  2-hydroxyl  and 
3-hydroxyl  groups  with  their  nearby  oxygen 
bridges  are  listed  in  Tables  IV  and  V.  The  numbers 
of  intermolecular  hydrogen-bond  interactions  for 
the  eight  different  inclusion  complexes  are  shown 
in  Table  VI.  The  total  number  of  intramolecular 
hydrogen  bonds  between  the  2-hydroxyl  and  3-hy- 
droxyl  groups  of  adjacent  glucose  units  (intra  2-3) 
and  the  total  number  of  intramolecular  hydrogen 
bonds  between  the  2-hydroxyl  and  3-hydroxyl 
groups  with  their  nearby  oxygen  bridges  (intra 
bridge),  the  total  number  of  intermolecular  hydro¬ 
gen  bonds  between  the  host  and  guest  (inter),  and 
the  macrocyclic  geometric  constant  0(4)*  “0(40 
and  0(4)  0(4')  0(4")  are  shown  in  Table  VII. 

From  Table  VII,  we  see  that  the  number  of  inter¬ 
molecular  hydrogen-bond  interactions  correlates 
with  the  orientation  of  the  guest  molecule  except 
in  the  case  of  the  /3-CD  with  benzoic  acid.  "Head¬ 
first"  /3-CD  with  benzoic  acid  has  one  more  in¬ 
tramolecular  hydrogen-bond  interaction,  09  •  ■  • 
H145 — 073,  which  comes  from  the  2-hydroxyl 
group  and  the  oxygen  bridge.  The  "tail-first"  ori¬ 
entation  has  one  intermolecular  hydrogen-bond  in¬ 
teraction,  074 — H146---0161.  However,  judging 
by  the  distances,  the  "head-first"  orientation  prob¬ 
ably  has  stronger  hydrogen  bonds  than  does  the 
"tail-first":  Compare  2.34  A  for  "head  first"  and 
2.46  A  for  "tail  first." 

From  Table  VII,  we  see  that  the  macrocyclic 
constants  0(4)  •••0(4')  and  0(4)  •••  0(4')  •••  0(4") 
are  slightly  increased  from  a-CD  to  /3-CD,  which 
is  in  agreement  with  the  results  from  Saenger  [59]. 
Their  values  of  0(4)  •••  0(4')  for  a-  and  /3-CD  are 
4.23  and  4.36  A,  respectively,  and  of  0(4)  ••♦ 
0(4')  ••♦0(4")  for  a-  and  /3- CD,  120°  and  128°, 
respectively.  As  mentioned  by  Saenger,  the  macro- 


cyclic  geometric  constant,  the  average  0(4)  •••  0(4') 
distances  (primed  atom  belongs  to  the  next  glu¬ 
cose)  forming  the  edges  of  the  macrocycle,  is  more 
or  less  constant  within  each  member  of  the  CD 
family,  and  the  individual  differences  between  a- 
and  /3-CD  arise  because  the  glucose  unit  has  to 
adjust  to  the  respective  radius  of  the  CD. 

Fixed  geometry  AMI  calculations  (1SCF)  were 
performed  on  the  host  and  guest  molecules  using 
their  geometries  in  the  optimized  complex  geome¬ 
try.  The  enthalpies  of  formation  calculated  at  these 
geometries  are  shown  along  with  their  differences 
from  the  enthalpies  at  the  optimized  geometries  in 
Table  VIII.  From  Table  VIII,  we  find  that  the  most 
stable  inclusion  complexes  are  formed  when  the 
host  geometry  in  the  complex  is  closest  to  the 
optimized  host  geometry,  except  in  the  case  of 
/3-CD  with  phenol.  The  1SCF  enthalpy  of  j8-CD  at 
its  geometry  in  the  "head-first"  /3-CD-phenol  com¬ 
plex  is  closer  to  the  enthalpy  at  the  /3-CD  optimum 
geometry  than  is  the  1SCF  enthalpy  at  its  geome¬ 
try  in  the  "tail-first"  complex,  even  though  the 
"tail-first"  complex  has  more  stabilization  energy. 
We  believe  this  is  because  that  formation  of  three 
intermolecular  hydrogen  bonds  in  the  "tail-first" 
complex  leads  to  a  larger  change  in  the  /3-CD 
geometry  than  in  the  "head-first"  complex. 

Comparisons  of  the  AMI  structures  of  four  sta¬ 
ble  inclusion  complexes  with  those  of  isolated  a- 
or  /3-CD  are  given  in  Tables  IX-XII.  Tables  IX-XII 
represent  the  average  values  and  corresponding 
ranges  of  the  bond  lengths,  bond  angles,  and  dihe¬ 
dral  angles.  The  average  bond  lengths  for  the  four 
neutral  inclusion  complexes  are  not  changed  from 
those  in  the  isolated  a-  or  /3-CD.  The  average 
C2 — C3 — C4,  C3 — C4 — C5,  and  06— C6— C5 
bond  angles  for  the  a-CD  with  benzoic  acid  in  the 
"head-first"  position  are  smaller  than  those  in  the 
isolated  a-CD.  The  average  dihedral  angles  for  the 
a-CD  with  benzoic  acid  in  the  "head-first"  posi¬ 
tion  are  larger  than  those  in  the  isolated  a-CD, 
except  for  the  C5 — 05 — Cl — C2  and  05 — Cl — C2 
— C3  dihedral  angles.  The  average  bond  angles 
and  dihedral  angles  for  /3-CD  with  benzoic  acid  in 
the  "head-first"  position  are  very  close  to  those  in 
the  isolated  j3-CD,  except  for  the  C2 — C3 — C4 — C5 
and  C3 — C4 — C5 — 05  dihedral  angles,  which  are 
significantly  smaller  than  those  in  the  isolated  /3- 
CD.  The  average  bond  angles  and  dihedral  angles 
for  the  a-CD  with  phenol  with  the  "head-first" 
position  are  very  close  to  those  in  the  isolated 
a-CD  except  for  the  Cl — C2 — C3,  C4 — C5 — C6, 
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TABLE  IV 


Bond  distance  (A)  for  each  intramolecular  hydrogen-bond  interaction  between  the  2-hydroxyl  and  the 

3-hydroxyl  groups  of  adjacent  glucose  units  of  the  host  molecule  in  the  X-ray  conformer,  AMI -optimized 
conformer,  and  in  eight  inclusion  complexes  with  the  cutoff  criteria  for  hydrogen-bond  0  —  H  •  •  •  0 
interaction  with  H  •••  0  distances  equal  or  less  than  3.00  A  and  angles  at  H  larger  than  90°. 

No. 

Host  molecule  0  —  H  •••  0 

Atomic  no.  and 

H  —  0  bond  length 

Angles  (degrees)  at  H 

a-CD  (X-ray)  5 

07-H85  2.18  A 

07 -H85  —  019  =  149.47 

018  -H95  2.03  A 

018 -H95  — 030  =  138.93 

029 -HI  05  1.99  A 

029  •••  HI 05 —  041  =150.96 

040  •••H1 15  2.00  A 

O40-H115  —  052  =  139.99 

08  ••  HI 24  2.08  A 

08  -H124  —  062  =  120.07 

a-CD  (AMI)  5 

07  —  H85  2.29  A 

07-H85  — 019  =  160.70 

018  —  H95  2.21  A 

018-H95  —  030  =  159.80 

029  ••■  H105  2.15  A 

029  •••  HI  05  —  041  =  164.31 

040  •••  H115  2.11  A 

040 --H1 15  —  052  =  160.99 

062  •••  H75  2.16  A 

062 -H75  —  08  =  163.55 

jS-CD  (X-ray)  7 

019  •••  H85  2.19  A 

07  —  H85- 019  =  137.60 

073  •••  H86  2.07  A 

073-H86  —  08  =  149.50 

O30-H95  2.08  A 

030 -H95  — 018  =  140.70 

041  •••  H105  2.18  A 

041  -  HI  05  —  029  =  128.28 

052  •••H1 15  2.09  A 

052  *  H1 15  —  040  =  128.59 

063  *•*  H125  2.25  A 

063  *••  H125  —  051  =  115.26 

(051  -  HI 36  2.53  A) 

(051  •••  HI  36  —  063  =  96.46) 

062  -HI  46  2.03  A 

062  *♦*  HI 46  —  074  =  146.12 

j3-CD  (AMI)  7 

073  •••  H86  2.13  A 

073-  H86  —  08  =  160.45 

07  •••  H96  2.14  A 

07- H96  —  019  =  153.22 

018  •••  H106  2.12  A 

018  ••■  H106  —  030  =  168.43 

029 -H1 16  2.21  A 

029-H116  —  041  =154.27 

063  ••*  H125  2.17  A 

063 -HI  25  — 051  =  139.26 

040  •**  H126  2.13  A 

O40-H126  —  052  =  165.37 

074* -HI  35  2.25  A 

074-H135  —  062  =  177.47 

a-CD  +  BA  “head  first”  6 

062  •*•  H75  2.13  A 

062  -H75  —  08  =  169.40 

07  —  H85  2.24  A 

07-H85  —  019  =  163.56 

018  •••  H95  2.27  A 

018 -H95  — 030  =  170.25 

029  •••  H105  2.20  A 

029  •••  H105  —  041  =150.89 

040  H1 15  2.17  A 

O40-H115  —  052  =  170.14 

063  •••  H1 14  2.90  A 

063-H114  —  051  =  134.07 

(051  *  *  *  H 1 25  2.88  A) 

(051  •••  HI  25  —  063  =  135.20) 

a-CD  +  BA  “tail  first”  6 

062-H75  2.10  A 

062  - H75  —  08  =  154.66 

07  •••  H85  2.12  A 

07- H85  —  019  =  159.77 

018  H95  2.08  A 

018  •••  H95  —  030  =  158.20 

029  *••  H105  2.10  A 

029  •••  H105  —  041  =164.33 

063  *••  H1 14  2.14  A 

063-H114  —  051  =150.72 

040  —  H1 15  2.12  A 

040-H115  —  052  =  160.55 
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TABLE  IV . 
(Continued) 


No.  Atomic  no.  and 


Host  molecule 

O 

X 

1 

o 

H  •••  0  bond  length 

Angles  (degrees)  at  H 

/3-CD  +  BA  “headfirst” 

7 

073- 

•  H86  2.20  A 

073  • 

•H86  —  08  =  153.26 

07- 

H96  2.14  A 

07  ••• 

H96  — 019  =  155.88 

018- 

•HI  06  2.12  A 

018- 

•HI  06  —  030=  169.59 

029- 

•H1 16  2.30  A 

029- 

•H116  —  041  =153.49 

063  • 

•HI  25  2.22  A 

063- 

•HI  25  —  051  =  144.49 

040  * 

•HI 26  2.21  A 

040- 

•H126  —  052  =  160.52 

074- 

•  HI 35  2.16  A 

074- 

•H135  —  062=  167.06 

/3-CD  +  BA  “tail  first” 

7 

073- 

•  H86  2.17  A 

073- 

•H86  — 08  =  159.37 

07- 

H96  2.18  A 

07- 

H96  —  019  =  150.04 

018- 

•  HI 06  2.12  A 

018- 

•HI  06  —  030  =  167.65 

029- 

•H1 16  2.22  A 

029- 

•H116  — 041  =  144.78 

063- 

•  H125  2.19  A 

063- 

H125  —  051  =  140.10 

040- 

•  HI 26  2.13  A 

040- 

•H126  —  052  =  166.80 

074- 

•  H135  2.21  A 

074- 

•HI  35  —  062  =  178.60 

a-CD  +  phenol  “head  first” 

6 

019- 

•  H74  2.61  A 

019- 

•H74  —  07  =  96.01 

(07- 

H85  2.20  A) 

(07- 

H85  — 019  =  126.00) 

018- 

•  H95  2.16  A 

018- 

•H95  — 030  =  164.62 

029- 

•  HI  05  2.18  A 

029- 

•H105  —  041  =  158.94 

040- 

•  H1 15  2.15  A 

040- 

•H115  —  052=  162.51 

063  • 

•H1 14  2.31  A 

063- 

•H114  — 051  =  141.24 

08  ••• 

HI  24  2.24  A 

08  ••• 

HI  24  — 062  =  145.72 

a-CD  +  phenol  “tail  first” 

6 

07  — 

H85  2.28  A 

07  ■•• 

H85  —  019  =  146.97 

018- 

•  H95  2.11  A 

018- 

•H95  — 030  =  146.78 

029- 

•HI  05  2.15  A 

029- 

•H105  —  041  =  166.68 

040- 

•H1 15  2.20  A 

040- 

•H115  —  052=  159.94 

063- 

•  H1 14  2.14  A 

063- 

•H114  —  051  =  149.78 

08- 

HI 24  2.11  A 

08  ■•• 

HI  24  — 062  =  156.43 

fi-CD  +  phenol  “head  first” 

7 

073- 

•  H86  2.13  A 

073- 

•H86  — 08  =  169.94 

07  — 

H96  2.25  A 

07  — 

H96  — 019  =  151.88 

018- 

•  H106  2.15  A 

018- 

•H106  —  030  =  169.41 

029- 

•  H1 16  2.15  A 

029- 

•H116  —  041  =  159.42 

040- 

•  H126  2.21  A 

040- 

•H126  —  052  =  167.12 

051  • 

•  H136  2.92  A 

051  • 

•H136  —  063  =  106.00 

062- 

•  H146  2.11  A 

062- 

•H146  —  074  =  159.49 

J3-CD  +  phenol  “tail  first” 

7 

073- 

•  H86  2.15  A 

073- 

•  H86  —  08  =  149.11 

07- 

H96  2.14  A 

07  •  •• 

H96  — 019  =  159.12 

018' 

•  H106  2.14  A 

018- 

•HI  06  —  030  =  166.70 

029- 

•  H1 16  2.18  A 

029- 

•H116  — 041  =144.54 

063- 

•HI  25  2.20  A 

063- 

•H125  —  051  =  147.82 

040- 

•  HI 26  2.19  A 

040- 

•H126  —  052  =  159.69 

074- 

•  H135  2.11  A 

074- 

•H135  —  062  =  175.94 
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TABLE  V  _ 

Bond  distance  (A)  for  each  intramolecular  hydrogen-bond  interaction  between  the  2-hydroxyl  or  the 
3-hydroxyl  groups  and  nearby  oxygen  bridges  of  the  host  molecule  in  the  X-ray  conformer,  AMI  -optimized 
conformer,  and  in  eight  inclusion  complexes  with  the  cutoff  criteria  for  hydrogen-bond  O  —  H  •  •  •  O  interaction 
with  H  •  •  •  O  distances  equal  or  less  than  3.00  A  and  angles  at  H  larger  than  90°. 

No.  Atomic  no.  and 


Host  molecule 

o 

X 

1 

o 

H  ••• 

0  bond  length 

Angles  (degrees)  at  H 

a-CD  (X-ray) 

5 

020 

•••  H85  2.40  A 

020 

••■  H85  — 019  =  105.78 

031 

—  H95  2.44  A 

031 

-H95  —  030  =  102.18 

042 

•••  H105  2.58  A 

042 

•••HI  05  —  041  =98.29 

064 

•♦*  H1 14  2.46  A 

064 

•  H1 14  —  051  =102.17 

053 

-  -  H1 15  2.66  A 

053 

-H115  —  052  =  93.07 

a-CD  (AMI) 

7 

09- 

•  H75  2.73  A 

09  •• 

•H75  — 08  =  94.06 

020 

•••  H85  2.49  A 

020 

■H85  — 019  =  106.01 

031 

•••  H95  2.47  A 

031 

•••H95  — 030  =  105.68 

042 

H105  2.73  A 

042 

-H105  —  041  =95.11 

064 

•••  H1 14  2.50  A 

064 

-H114  —  051  =95.88 

053 

-H1 15  2.52  A 

053 

•■•H1 15  —  052  =  103.51 

064 

•••  HI  25  2.46  A 

064 

-H125  —  063  =  105.04 

/3-CD  (X-ray) 

7 

020 

•••  H85  2.31  A 

020 

...  H85  —  07  =  105.74 

09- 

•  H86  2.51  A 

09  •• 

•  H86  —  08  =  97.67 

031 

••*  H95  2.28  A 

031 

•••  H95  — 018  =  105.14 

042 

—  H105  2.32  A  . 

042 

-H105  —  029  =  108.35 

053 

•••H1 15  2.54  A 

053 

•••H1 15  —  040  =  95.45 

064 

—  H125  2.51  A 

064 

-H125  —  051  =94.95 

075 

•••  H146  2.43  A 

075 

-H146  —  074  =  103.57 

/3-CD  (AMI) 

7 

09- 

•  H86  2.78  A 

09- 

•H86  —  08  =  92.47 

020 

•*♦  H96  2.46  A 

020 

•  -H96  —  019  =  107.34 

031 

•••  H106  2.64  A 

031 

••■  HI  06  —  030  =  98.64 

042 

•••  H1 1 6  2.43  A 

042 

•••  H1 16  —  041  =  108.05 

064 

*•*  H125  2.48  A 

064 

•••H125  —  051  =101.45 

053 

•••  H126  2.63  A 

053 

•••H126  —  052  =  96.92 

075 

•••  H135  2.46  A 

075 

•••H135  — 062  =  106.66 

a-CD  +  BA  “headfirst” 

7 

09- 

■•  H75  2.62  A 

09- 

H75  — 08  =  99.86 

020 

•••  H85  2.51  A 

020 

•••H85  — 019  =  105.08 

031 

•••  H95  2.54  A 

031 

•■•H95  — 030  =  102.50 

042 

■••HI  05  2.44  A 

042 

•••H105  — 041  =  107.44 

064 

•••H1 14  2.69  A 

064 

•••H114  —  051  =92.54 

053 

•••  H1 15  2.63  A 

053 

•••H1 15  —  052  =  99.96 

064 

*••  H125  2.44  A 

064 

-H125  —  063  =  103.55 

a-CD  +  BA  “tail  first” 

7 

09- 

•  H75  2.82  A 

09- 

•H75  — 08  =  93.93 

020 

■••  H85  2.49  A 

020 

•H85  — 019  =  106.36 

031 

•••  H95  2.60  A 

031 

•••H95  —  030  =  102.26 

042 

•••  H105  2.58  A 

042 

■-H105  — 041  =  101.76 

064 

•••  H114  2.63  A 

064 

•••H114  — 051  =98.03 

053 

•••  H1 15  2.58  A 

053 

•••H115  — 052  =  102.11 

09- 

•HI 24  2.51  A 

09- 

•  HI  24  —  062  =  95.57 
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TABLE  V 

(Continued). 

Host  molecule 

No. 

0  — H-*0 

Atomic  no.  and 

H  •••  0  bond  length 

Angles  (degrees)  at  H 

/3-CD  +  BA  “head  first” 

8 

09** 

•  H86  2.80  A 

09- 

H86  — 08  =  92.12 

020 

*  H96  2.42  A 

020 

•H96  — 019  =  108.48 

031 

•HI  06  2.66  A 

031 

•HI  06  —  030  =  98.90 

042 

*  H1 16  2.43  A 

042 

H116  —  041  =  107.90 

064 

•HI  25  2.43  A 

064- 

HI  25  — 051  =  104.23 

053 

•HI 26  2.71  A 

053' 

•HI  26  —  052  =  93.01 

075 

HI 35  2.46  A 

075 

H135  —  062  =  105.14 

09  •• 

HI  45  2.34  A 

09  •• 

HI  45  —  073  =  101.83 

fi-CD  +  BA  “tail  first” 

7 

09- 

H86  2.80  A 

09- 

H86  — 08  =  92.47 

020 

H96  2.44  A 

020 

•  H96  — 019  =  108.02 

031 

•HI  06  2.65  A 

031 

•HI  06  —  030  =  98.12 

042 

•  H116  2.45  A 

042 

H116  — 041  =  107.31 

064 

•HI  25  2.46  A 

064 

HI  25  — 051  =  102.86 

053 

•HI  26  2.65  A 

053 

■HI  26  —  052  =  96.82 

075 

•HI  35  2.46  A 

075- 

HI  35  — 062  =  106.00 

a-CD  +  phenol  “head  first” 

6 

020 

H85  2.52  A 

020- 

•H85  — 019  =  99.58 

031 

•  H95  2.76  A 

031 

•H95  — 030  =  97.56 

042 

HI  05  2.46  A 

042- 

•H105  —  041  =106.14 

064 

*  H1 14  2.38  A 

064- 

•H114  —  051  =105.27 

053 

*  H1 15  2.56  A 

053- 

•H1 15  —  052  =  100.58 

09- 

HI 24  2.43  A 

09 

HI  24  — 062  =  108.05 

a-CD  +  phenol  “tail  first” 

5 

020 

H85  2.55  A 

020- 

•  H85  — 019  =  100.51 

031 

H95  2.52  A 

031  • 

•H95  — 030  =  104.74 

042 

HI  05  2.67  A 

042- 

•HI  05  —  041  =97.43 

064 

H114  2.59  A 

064- 

•H114  —  051  =99.14 

053 

•  H115  2.52  A 

053- 

•H1 15  —  052  =  104.22 

/3-CD  +  phenol  “head  first” 

8 

09- 

H86  2.68  A 

09- 

H86  — 08  =  99.54 

020 

H96  2.43  A 

020- 

•H96  — 019  =  107.71 

031 

HI  06  2.62  A 

031  • 

•HI  06  —  030  =  99.09 

042 

H1 16  2.49  A 

042- 

•H1 16  —  041  =  105.88 

053' 

•HI  26  2.69  A 

053- 

•HI  26  —  052  =  99.28 

075' 

•HI  35  2.54  A 

075- 

HI  35  —  062  =  90.13 

064' 

HI 36  2.39  A 

064- 

•HI  36  —  063  =  107.01 

075- 

•HI  46  2.73  A 

075- 

•HI  46  —  074  =  94.47 

p- CD  +  phenol  “tail  first” 

6 

020- 

H96  2.46  A 

020- 

•  H96  — 019  =  107.34 

031  • 

•HI  06  2.68  A 

031  • 

•HI  06  —  030  =  97.11 

042- 

H1 16  2.46  A 

042- 

•H116  —  041  =107.81 

064- 

HI 25  2.42  A 

064- 

•HI  25  — 051  =  104.65 

053- 

•HI  26  2.74  A 

053- 

■H126  —  052  =  91.97 

075- 

•  H135  2.53  A 

075- 

HI  35  — 062  =  103.24 
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TABLE  VI 


Bond  distance  (A)  for  each  hydrogen-bond  interaction  between  the  guest  and  host  of  eight  inclusion 
complexes  with  the  cutoff  criteria  for  hydrogen-bond  O  —  H  •  •  •  O  interaction  with  H  •••  O  distances  equal  or 
less  than  3.00  A  and  angles  at  H  larger  than  90°.  _ 


Inclusion 

complexes 

No. 

0  —  H-0 

Atomic  no.  and 

H  —  0  bond  length 

Angles  (degrees)  at  H 

a-CD  +  BA  “head  first” 

4 

Oil  •••  H141  (1°)  2.43  A 
033-H141  (1°)  2.37  A 

H76  •••  0139  (1°)  2.10  A 
H116  —  0139  (1°)  2.21  A 

Oil  -H141—  0140  =  119.58 

033 -HI  41— 0140  =  107.04 

Oil— H76- 01 39  =  137.80 

055  — H1 16 -0139  =  158.94 

a-CD  +  BA  “tail  first” 

1 

H124  •••  0139  (2°)  2.22  A 

062  — H124  -  0139  =  158.24 

/3-CD  +  BA  “head  first” 

0 

p-CD  +  BA  “tail  first” 

1 

H146  •••  0161  (2°)  2.46  A 

074  — H146  -  0161  =152.06 

a-CD  +  phenol  “head  first” 

a-CD  +  phenol  “tail  first” 
p- CD  +  phenol  “head  first” 

2 

0 

0 

055  •••  H139  (1°)  2.13  A 

H76  -  0133  (1°)  2.14  A 

H1 16  -0133  (1°)  2.88  A 

055 -H139  —  0133  =  124.77 

Oil—  H76  -  0133  =  165.70 

055  —  H1 16  —  0133  =  75.62 

/3-CD  +  phenol  “tail  first” 

3 

H145  •••  0154  (2°)  2.48  A 
073  •••  H160  (2°)  2.26  A 
H146  •••  0154  (2°)  2.25  A 

073— HI 45  -  0154  =  96.31 

073  •••  H160  — 0154  =  110.79 

074  —  H146  -  0154  =  123.19 

TABLE  VII _ _ _ _ 

Total  number  of  intramolecular  hydrogen  bonds  between  the  2-hydroxyl  and  3-hydroxyl  groups  of 
adjacent  glucose  units  (intra  2-3)  and  total  number  of  intramolecular  hydrogen  bonds  between  the  2-hydroxyl 
and  the  3-hydroxyl  groups  with  their  nearby  oxygen  bridges  (intra  bridge),  the  total  number  of  intermolecular 
hydrogen  bonds  between  the  host  and  guest  (inter),  and  the  macrocyclic  geometric  constant  0(4)  •  •  •  0(4') 
and  0(4)  •  •  •  0(4')  •  •  •  0(4").  _ 


Inclusion 

complexes 

No. 

intra  2  -3 

No. 

intra 

bridges 

No. 

inter 

0(4) -0(4') 

0(4)  -0(4')  -0(4") 

a-CD  (X-ray) 

5 

5 

4.246 

119.658 

a-CD  (AMI) 

5 

7 

4.186 

118.781 

p-CD  (X-ray) 

7 

7 

4.385 

128.330 

p-CD  (AMI) 

7 

7 

4.285 

128.301 

a-CD  +  BA  “headfirst” 

6 

7 

4 

4.218 

119.954 

a-CD  +  BA  “tail  first” 

6 

7 

1 

4.244 

119.871 

/3-CD  +  BA  “headfirst” 

7 

8 

0 

4.276 

128.226 

/3-CD  +  BA  “tail  first” 

7 

7 

1 

4.286 

128.353 

a-CD  +  phenol  “head  first” 

6 

6 

2 

4.223 

119.983 

a-CD  +  phenol  “tail  first” 

6 

5 

0 

4.170 

119.842 

p-CD  +  phenol  “head  first” 

7 

8 

0 

4.265 

127.605 

p- CD  +  phenol  “tail  first” 

7 

6 

3 

4.254 

128.325 
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TABLE  VIII 


Heats  of  formation  (A Hf,  kcal/mol)  for  1SCF  AMI  calculations  for  the  0-CD,  benzoic  acid,  and  phenol  by 
removing  the  host  or  guest  molecules  from  their  optimized  configuration;  energies  differences  from  their 
optimized  minimum  energies  (A  A  Hfd)  are  in  kcal  /  mol. 


Compound 

AH, 

1 SCF  for  host 

AAH?1a 

AH, 

1  SCF  for  guest 

A  A  Hf2b 

Benzoic  acid  (BA) 

—  68.0 

Phenol 

-22.2  . 

a-CD 

-1416.0 

/3-CD 

-1656.6 

a-CD  +  BA  “head  first” 

-1411.8 

4.2 

-67.7 

0.3 

a-CD  +  BA  “tail  first” 

-1410.1 

5.9 

-67.6 

0.4 

0-CD  +  BA  “head  first” 

-1655.0 

1.6 

-67.9 

0.1 

/3-CD  +  BA  “tail  first” 

-1647.6 

9.0 

-67.8 

0.2 

a-CD  +  phenol  “head  first” 

-1413.2 

2.8 

-21.8 

0.4 

a-CD  +  phenol  “tail  first” 

-1408.9 

7.1 

-22.1 

0.1 

/3-CD  +  phenol  “head  first” 

-1652.4 

4.2 

-22.2 

0.0 

/3-CD  +  phenol  “tail  first” 

-1650.9 

5.7 

-22.2 

0.0 

aAA Hd1  =  kHf  (1SCF  for  host)  -  AHf  (optimized  host). 
bAA Hf2  =  AHf  (1SCF  for  guest)  -  A Hf  (optimized  guest). 


and  02 — C2 — C3  bond  angles  and  Cl — C2 — 
C3 — C4,  C5— 05— Cl— C2,  and  05— Cl— C2— 
C3  dihedral  angles.  The  average  bond  angles  and 
dihedral  angles  for  the  /3-CD  with  phenol  in  the 
"tail-first"  position  are  very  close  to  those  in  the 
isolated  /3-CD  except  for  the  Cl — C2— C3,  05 — 
Cl — C2,  and  03 — C3 — C4  bond  angles  and  Cl — 
C2 — C3 — C4,  C5— 05— Cl— C2,  05— Cl— C2— 
C3,  and  02— C2 — C3 — C4  dihedral  angles. 

The  final  stage  of  this  work  involved  the  calcu¬ 
lation  of  the  enthalpy  (by  single-point  AMI  calcu¬ 
lation)  of  the  complexes  as  the  guest  is  moved  out 
of  the  host.  The  geometries  of  guest  and  host  were 
fixed  at  their  optimum  values  for  the  complex  and 
the  guest  was  moved  along  the  host's  principal 
inertial  axis  which  has  the  largest  moment  of  iner¬ 
tia.  This  axis  is  roughly  perpendicular  to  the  plane 
of  the  CD.  We  chose  it  to  be  the  Z-axis  of  the  host 
principal  axis  system.  Once  the  geometry  of  the 
system  is  expressed  in  terms  of  the  host  principal 
axis  system,  it  is  trivial  to  translate  the  guest  along 
the  Z-axis.  We  define  Z  -  0.00  to  be  the  AMl-op- 
timized  geometry.  The  heats  of  formation  from 
single-point  AMI  calculations  for  the  /3-CD  with 
benzoic  acid  in  the  "head-first"  and  "tail-first" 
positions  with  the  guest  moved  along  Z-axis  of  the 
host  principal  axis  coordinate  system  stepwise  are 
listed  in  Tables  XIII  and  XIV.  In  Figures  2  and  3, 
we  show  the  single-point  AMl-calculated  energy 


profile  vs.  the  displacement  of  the  guest  along  the 
Z-axis  of  the  host  principal  axis  coordinate  system. 
Figure  2  shows  the  effect  of  displacing  the  benzoic 
acid  from  its  optimum  position  in  the  /3-CD  with 
benzoic  acid  (bottom-first)  complex.  Movement  in 
the  +Z  direction  corresponds  to  movement  to¬ 
ward  the  secondary  alcohol,  and  movement  in  the 
—  Z  direction  means  movement  toward  the  pri¬ 
mary  alcohol.  When  the  guest  moves  in  the  —  Z 
direction,  AHj  increases  more  slowly  than  when 
the  guest  moves  in  the  +Z  direction.  When  the 
guest  has  moved  3  A  in  the  +Z  direction,  the 
benzene  ring  is  barely  interacting  with  the  sec¬ 
ondary  alcohol,  and  the  rest  of  the  guest  is  outside 
of  the  cavity.  Then,  when  the  guest  is  moved  1  and 
2  A  in  the  +Z  direction,  the  benzene  ring  is 
deeper  inside  the  cavity,  and  the  inclusion  com¬ 
plex  is  more  stable,  until  the  minimum  conformer 
is  reached.  After  the  minimum  configuration  is 
reached,  moving  the  guest  in  the  -Z  direction 
increases  the  heats  of  formation  more  slowly  than 
when  moving  the  benzene  ring  into  the  cavity.  In 
Figure  3,  the  behavior  of  the  /3-CD  with  benzoic 
acid  (head-first)  complex  is  shown.  When  the  car¬ 
boxylic  acid  group  starts  to  enter  the  host  from  the 
secondary  alcohol  (  +  3  A  position),  the  A  Hf  is 
about  0.7  kcal/mol  higher  than  the  most  stable 
configuration.  Then,  moving  the  carboxylic  acid 
group  deeper  into  the  host  cavity,  which  means 
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TABLE  IX _ _ _ — 

AMI  structural  parameters  of  C6Os  subunits  of  the  inclusion  complex  of  a-CD  and  benzoic  acid  and  those 
of  a-CD;  average  values  of  each  parameter  and  their  ranges  are  given: 


0(6) 


a-CD  +  benzoic  acid 

p 

6 

D 

Ave.  Range 

Ave. 

Range 

Bond  length  (A) 


Cl—  C2 

1.543 

[1.542, 

1 .546] 

1.542 

[1.540, 

1.544] 

C2 — C3 

1.534 

[1.531, 

1 .535] 

1.533 

[1.530, 

1 .536] 

C3 — C4 

1.540 

[1.538, 

1 .542] 

1.539 

[1 .537, 

1 .542] 

C4 — C5 

1.537 

[1.534, 

1 .539] 

1.536 

[1 .534, 

1 .539] 

C5 — C6 

1.534 

[1.532, 

1 .536] 

1.534 

[1.532, 

1 .538] 

C5— 05 

1.432 

[1.428, 

1 .436] 

1.432 

[1.430, 

1 .434] 

05— Cl 

1.411 

[1.409, 

1.414] 

1.412 

[1 .408, 

1.414] 

Cl— 01 

1.415 

[1.412, 

1 .420] 

1.414 

[1.413, 

1.416] 

C2— 02 

1.414 

[1.408, 

1.417] 

1.414 

[1.410, 

1.416] 

C3— 03 

1.416 

[1.414, 

1.418] 

1.417 

[1.414, 

1.419] 

Bond  angle  (deg.) 

Cl— C2— C3 

109.6 

[108.7, 

110.2] 

109.6 

[109.0, 

110.6] 

C2 — C3 — C4 

110.2 

[109.0, 

111.1] 

110.5 

[109.8, 

111.2] 

C3 — C4 — C5 

110.8 

[109.7, 

112.3] 

111.3 

[109.9, 

112.5] 

04— C5— C6 

111.6 

[110.7, 

112.4] 

111.7 

[110.9, 

112.8] 

05— Cl— C2 

110.8 

[110.3, 

111.1] 

110.9 

[110.1, 

112.4] 

06— C6— C5 

111.4 

[110.4, 

112.5] 

111.5 

[110.3, 

112.1] 

02— C2— C3 

111.6 

[110.9, 

113.1] 

111.5 

[110.9, 

112.0] 

03— C3— C4 

111.0 

[110.3, 

111.8] 

110.9 

[110.4, 

111.1] 

Dihedral  angle  (deg.) 

Cl— C2— C3— C4 

-54.3 

[-57.2,  - 

-51.8] 

-53.3 

[-56.2,  - 

-48.8] 

C2 — C3 — C4 — C5 

52.1 

[47.8, 

55.8] 

50.9 

[48.4, 

53.5] 

03— C4— C5— 05 

-51.1 

[-53.6,  ■ 

-  46.8] 

-50.3 

[-56.1,  - 

-44.0] 

C4— C5— 05— Cl 

55.8 

[54.1, 

57.8] 

55.5 

[50.2, 

60.2] 

C5— 05— Cl—  C2 

-58.4 

[-60.6, 

-  56.2] 

-58.7 

[-61.1,  - 

-57.8] 

05— Cl— C2— C3 

56.4 

[54.3, 

59.4] 

56.4 

[50.4, 

59.5] 

02— C2— C3— C4 

-173.7 

[-177.4,- 

170.5] 

-172.6 

[-175.5,- 

170.3] 

03— C3— C4— C5 

170.2 

[166.0, 

173.7] 

168.7 

[166.0, 

171.1] 
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TABLE  X _ 

AMI  structural  parameters  of  C605  subunits  of  the  inclusion  complex  of  a-CD  and  phenol  and  those 
of  a-CD;  average  values  of  each  parameter  and  their  ranges  are  given: 


o 


Bond  length  (A) 


C1—C2 

1.542 

[1.539, 

1 .544] 

1.542 

[1.540,  1.544] 

C2 — C3 

1.535 

[1.533, 

1.537] 

1.533 

[1.530,  1.536] 

C3 — C4 

1.540 

[1.538, 

1 .543] 

1.539 

[1 .537,  1 .542] 

C4 — C5 

1.538 

[1 .534, 

1 .539] 

1.536 

[1 .534,  1 .539] 

C5 — C6 

1.533 

[1 .531 , 

1.535] 

1.534 

[1.532,  1.538] 

C5— 05 

1.432 

[1 .429, 

1 .436] 

1.432 

[1 .430,  1 .434] 

05— Cl 

1.411 

[1 .407, 

1.415] 

1.412 

[1.408,  1.414] 

Cl— 01 

1.415 

[1.412, 

1.418] 

1.414 

[1.413,  1.416] 

C2— 02 

1.414 

[1.407, 

1.421] 

1.414 

[1.410,  1.416] 

C3— 03 

1.418 

[1.415, 

1.419] 

1.417 

[1.414,  1.419] 

Bond  angle  (deg.) 

Cl— C2— C3 

110.3 

[109.5, 

111.8] 

109.6 

[109.0,  110.6] 

C2 — C3 — C4 

110.7 

[108.6, 

112.2] 

110.5 

[109.8,  111.2] 

C3 — C4 — C5 

111.1 

[109.4, 

113.4] 

111.3 

[109.9,  112.5] 

C4 — C5 — C6 

110.9 

[110.2, 

111.8] 

111.7 

[110.9,  112.8] 

05— Cl—  C2 

111.2 

[109.1, 

113.0] 

110.9 

[110.1,  112.4] 

06— C6— C5 

111.5 

[110.7, 

112.1] 

111.5 

[110.3,  112.1] 

02— C2— C3 

110.3 

[105.5, 

112.1] 

111.5 

[110.9,  112.0] 

"3- 

o 

1 

CO 

O 

l 

CO 

O 

109.2 

[106.2, 

111.6] 

110.9 

[110.4,  111.1] 

Dihedral  angle  (deg.) 

O 

i 

CO 

O 

l 

C\J 

o 

1 

T— 

o 

-52.2 

[-57.2,  - 

-  47.7] 

-53.3 

[-56.2,  -48.8] 

O 

i 

'M- 

o 

1 

CO 

o 

1 

CM 

Q 

50.1 

[41.1, 

57.5] 

50.9 

[48.4,  53.5] 

C3— C4— C5— 05 

-  50.0 

[-55.9,  - 

-40.1] 

-50.3 

[-56.1,  -44.0] 

C4— C5— 05— Cl 

55.0 

[50.6, 

57.9] 

55.5 

[50.2,  60.2] 

C5— 05— Cl— C2 

-57.3 

[-62.3,  - 

-  55.0] 

-58.7 

[-61.1,  -57.8] 

05— Cl— C2— C3 

54.9 

[50.7, 

60.4] 

56.4 

[50.4,  59.5] 

02— C2— C3— C4 

-173.2 

[-176.3,  - 

169.3] 

-172.6 

[-175.5,-170.3] 

03— C3— C4— C5 

168.8 

[159.1, 

174.7] 

168.7 

[166.0,  171.1] 
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TABLE  XI _ - _ 

AMI  structural  parameters  of  C605  subunits  of  the  inclusion  complex  of  p-CD  and  phenol  and  those 
of  p-CD;  average  values  of  each  parameter  and  their  ranges  are  given: 


o 


jS-CD  +  phenol 

p-CD 

Ave. 

Range 

Ave. 

Range 

Bond  length  (A) 

Cl— C2 

1.541 

[1.538, 

1.545] 

1.541 

[1 .538, 

1 .544] 

C2— C3 

1.536 

[1.534, 

1.537] 

1.535 

[1 .534, 

1 .536] 

C3 — C4 

1.538 

[1 .537, 

1.540] 

1.538 

[1 .537, 

1 .540] 

C4 — C5 

1.538 

[1 .535, 

1.541] 

1.537 

[1 .535, 

1 .538] 

C5 — C6 

1.533 

[1.531, 

1.536] 

1.533 

[1 .532, 

1 .534] 

C5— 05 

1.432 

[1.430, 

1 .433] 

1.431 

[1 .430, 

1 .432] 

05— Cl 

1.412 

[1.407, 

1.417] 

1.411 

[1 .405, 

1.414] 

Cl— 01 

1.418 

[1.413, 

1.425] 

1.417 

[1.413, 

1.421] 

C2— 02 

1.413 

[1.407, 

1.417] 

1.413 

[1 .406, 

1.416] 

C3— 03 

1.417 

[1.416, 

1.418] 

1.417 

[1.416, 

1.418] 

Bond  angle  (deg.) 

Cl— C2— C3 

109.8 

[109.0, 

110.4] 

110.4 

[109.6, 

112.1] 

O 

i 

CO 

O 

1 

CM 

o 

109.7 

[108.4, 

110.4] 

110.0 

[108.9, 

111.2] 

8 

1 

o 

1 

8 

110.5 

[108.4, 

112.0] 

110.6 

[110.1, 

111.6] 

C4 — C5 — C6 

111.6 

[110.3, 

113.8] 

111.3 

[110.8, 

111.8] 

05— Cl—  C2 

111.4 

[110.6, 

112.8] 

112.0 

[111.0, 

113.8] 

06— C6— C5 

112.3 

[111.7, 

114.3] 

112.0 

[111.6, 

112.5] 

CO 

O 

i 

CM 

O 

1 

CM 

O 

111.2 

[110.4, 

112.1] 

111.2 

[110.3, 

112.4] 

03— C3— C4 

110.5 

[107.1, 

112.8] 

109.9 

[106.8, 

112.0] 

Dihedral  angle  (deg.) 

Cl—  C2— C3— C4 

-54.5 

[-57.9,  ■ 

-  51 .6] 

-52.7 

[-54.5,  - 

-  46.7] 

C2 — C3 — C4 — C5 

53.6 

[49.6, 

58.7] 

53.3 

[51.8, 

55.6] 

C3 — C4 — C5 — 05 

-52.9 

[-57.1,  ■ 

-  46.6] 

-53.7 

[-56.3,  - 

-51.8] 

o 

1 

LO 

O 

1 

LO 

O 

1 

o 

55.9 

[52.2, 

58.6] 

56.2 

[54.5, 

58.1] 

C5— 05— Cl— C2 

-57.3 

[-58.8, 

-  55.6] 

-56.0 

[-57.5,  - 

-  53.2] 

05— Cl  — C2— C3 

55.7 

[53.3, 

59.5] 

53.3 

[46.6, 

55.8] 

02— C2— C3— C4 

-174.3 

[-177.9,- 

172.2] 

-173.3 

[-176.0,- 

169.2] 

lO 

o 

1 

"d- 

o 

1 

CO 

O 

l 

CO 

O 

171.7 

[166.6, 

176.6] 

171.6 

[170.2, 

173.2] 
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TABLE  XII _ 

AMI  structural  parameters  of  C605  subunits  of  the  inclusion  complex  of  p-CD  and  benzoic  acid  and  those 
of  p-CD;  average  values  of  each  parameter  and  their  ranges  are  given: 


o 


/3-CD  +  benzoic  acid  /3-CD 


Ave. 

Range 

Ave. 

Range 

Bond  length  (A) 

Cl—  C2 

1.541 

[1.538,  1.544] 

1.541 

[1,538,  1.544] 

C2—C3 

1.535 

[1.534,  1.536] 

1.535 

[1.534,  1.536] 

C3 — C4 

1.538 

[1.536,  1.540] 

1.538 

[1.537,  1.540] 

C4 — C5 

1.537 

[1.535,  1.539] 

1.537 

[1.535,  1.538] 

C5 — C6 

1.533 

[1.532,  1.535] 

1.533 

[1.532,  1.534] 

C5— 05 

1.431 

[1.430,  1.433] 

1.431 

[1.430,  1.432] 

05— Cl 

1.410 

[1.405,  1.413] 

1.411 

[1.405,  1.414] 

Cl —01 

1.417 

[1.414,  1.422] 

1.417 

[1.413,  1.421] 

C2— 02 

1.413 

[1.407,  1.417] 

1.413 

[1.406,  1.416] 

C3— 03 

1.418 

[1.416,  1.419] 

1.417 

[1.416,  1.418] 

Bond  angle  (deg.) 

Cl— C2— C3 

110.2 

[108.9,  112.1] 

110.4 

[109.6,  112.1] 

C2 — C3 — C4 

110.2 

[108.6,  111.5] 

110.0 

[108.9,  111.2] 

C3 — C4 — C5 

110.9 

[109.8,  112.0] 

110.6 

[110.1,  111.6] 

C4 — C5 — C6 

111.3 

[110.0,  111.9] 

111.3 

[110.8,  111.8] 

05— Cl— C2 

111.9 

[110.5,  113.4] 

112.0 

[111.0,  113.8] 

06— C6— C5 

112.2 

[111.6,  112.9] 

112.0 

[111.6,  112.5] 

02— C2— C3 

111.2 

[110.2,  112.6] 

111.2 

[110.3,  112.4] 

03— C3— C4 

109.9 

[106.8,  112.7] 

109.9 

[106.8,  112.0] 

Dihedral  angle  (deg.) 

Cl— C2— C3— C4 

-52.5 

[-56.1,  -46.1] 

-52.7 

[-54.5,  -46.7] 

C2 — C3 — C4 — C5 

52.4 

[50.3,  56.4] 

53.3 

[51 .8,  55.6] 

C3 — C4 — C5 — 05 

-52.6 

[-56.0,  -47.1] 

.  -  53.7 

[-56.3,  -51.8] 

C4— C5— 05— Cl 

55.9 

[52.6,  59.2] 

56.2 

[54.5,  58.1] 

C5— 05— Cl—  C2 

-56.7 

[-  58.9,  -  54.8] 

-56.0 

[-57.5,  -53.2] 

05— Cl— C2— C3 

54.0 

[47.2,  59.2] 

53.3 

[46.6,  55.8] 

o 

1 

CO 

O 

I 

CM 

O 

1 

CM 

O 

-173.0 

[-175.8,  -168.5] 

-173.3 

[-176.0,-169.2] 

03— C3— C4— C5 

170.8 

[167.8,  175.0] 

171.6 

[170.2,  173.2] 
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TABLE  XIII _ 

Heats  of  formation  from  1SCF  AMI  calculations 
for  the  p-CD  with  benzoic  acid  in  the  “tail-first” 
position  with  the  guest  moved  along  the  Z-axis  of 
host  principal  axis  coordinate  system  stepwise. 


A  Hf 

Inclusion  complexes  (kcal  /  mol) 


p-CD  +  benzoic  acid  at  +3  of  the  Z-axis  - 1 694.8 

p-CD  +  benzoic  acid  at  +2  of  the  Z-axis  - 1 706.3 

p-CD  +  benzoic  acid  at  +1  of  the  Z-axis  -1714.0 

p-CD  +  benzoic  acid  at  0  of  the  Z-axis  - 1 71 6.9 

p-CD  +  benzoic  acid  at  - 1  of  the  Z-axis  - 1714.7 

p-CD  +  benzoic  acid  at  -2  of  the  Z-axis  -1713.8 

p-CD  +  benzoic  acid  at  -  3  of  the  Z-axis  - 1 71 4.6 


TABLE  XIV _ 

Heats  of  formation  from  1SCF  AMI  calculations 
for  the  p-CD  with  benzoic  acid  in  the  “head-first” 
position  with  the  guest  moved  along  the  Z-axis  of 
host  principal  axis  coordinate  system  stepwise. 


A  Hf 

Inclusion  complexes  (kcal  /  mol) 


p-CD  +  benzoic  acid  at  +3  of  the  Z-axis  -1724.3 

p-CD  +  benzoic  acid  at  4-2  of  the  Z-axis  - 1 723.3 

p-CD  +  benzoic  acid  at  -hi  of  the  Z-axis  - 1 724.2 

p-CD  +  benzoic  acid  at  0  of  the  Z-axis  - 1 725.0 

p-CD  +  benzoic  acid  at  -1  of  the  Z-axis  -1723.1 

p-CD  +  benzoic  acid  at  -  2  of  the  Z-axis  - 1 720.5 

p-CD  +  benzoic  acid  at  -3  of  the  Z-axis  -1718.0 


that  a  greater  portion  of  the  benzene  ring  is  in  the 
host  cavity,  stabilization  of  the  whole  complex 
increases  until  it  reaches  the  minimum-energy 
configuration.  Then,  when  the  carboxylic  acid 
group  is  moved  out  from  the  primary  alcohol,  a 
greater  portion  of  the  benzene  ring  is  out  of  the 
cavity,  causing  the  increasing  heats  of  formation. 
This  may  be  due  to  the  hydrophobic  interaction 
between  the  host  and  guest. 


In  conclusion,  we  studied  the  inclusion  com¬ 
plexes  of  a-CD  and  jS-CD  with  benzoic  acid  and 
phenol  in  the  "head-first"  and  "tail-first"  posi¬ 
tions.  The  driving  forces  for  complex  formation 
were  investigated  by  examining  combinations  of 
different  intermolecular  interactions  such  as  steric 
fit,  dipole-dipole  interaction,  intramolecular  hy¬ 
drogen  bonding,  intermolecular  hydrogen  bond¬ 
ing,  and  the  enthalpies  of  formation  of  host  and 
guest  molecules  calculated  at  their  geometries  in 
the  complex  and  at  their  optimized  geometries. 


Z-axis  distance  (A) 

FIGURE  2.  Plotting  of  1SCF  AMI  calculated  A Hf  vs.  displacement  of  guest  along  Z-axis  of  host  principal  axis 
coordinate  system  for  p-CD  with  benzoic  acid  in  “tail  first”  position. 
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Z-axis  distance  (A) 

FIGURE  3.  Plotting  of  1SCF  AMI  calculated  A Hf  vs.  displacement  of  guest  along  Z-axis  of  host  principal  axis 
coordinate  system  for  fi-CD  with  benzoic  acid  in  “head  first”  position. 
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